aboutsummaryrefslogtreecommitdiff
path: root/virt/kvm
AgeCommit message (Collapse)Author
2012-07-26KVM: remove dummy pagesXiao Guangrong
Currently, kvm allocates some pages and use them as error indicators, it wastes memory and is not good for scalability Base on Avi's suggestion, we use the error codes instead of these pages to indicate the error conditions Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-26KVM: use kvm_release_page_clean to release the pageXiao Guangrong
In kvm_async_pf_wakeup_all, it uses bad_page to generate broadcast wakeup, and uses put_page to release bad_page, the work depends on the fact that bad_page is the normal page. But we will use the error code instead of bad_page, so use kvm_release_page_clean to release the page which will release the error code properly Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-26Merge branch 'queue' into nextAvi Kivity
Merge patches queued during the run-up to the merge window. * queue: (25 commits) KVM: Choose better candidate for directed yield KVM: Note down when cpu relax intercepted or pause loop exited KVM: Add config to support ple or cpu relax optimzation KVM: switch to symbolic name for irq_states size KVM: x86: Fix typos in pmu.c KVM: x86: Fix typos in lapic.c KVM: x86: Fix typos in cpuid.c KVM: x86: Fix typos in emulate.c KVM: x86: Fix typos in x86.c KVM: SVM: Fix typos KVM: VMX: Fix typos KVM: remove the unused parameter of gfn_to_pfn_memslot KVM: remove is_error_hpa KVM: make bad_pfn static to kvm_main.c KVM: using get_fault_pfn to get the fault pfn KVM: MMU: track the refcount when unmap the page KVM: x86: remove unnecessary mark_page_dirty KVM: MMU: Avoid handling same rmap_pde in kvm_handle_hva_range() KVM: MMU: Push trace_kvm_age_page() into kvm_age_rmapp() KVM: MMU: Add memslot parameter to hva handlers ... Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-24Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Avi Kivity: "Highlights include - full big real mode emulation on pre-Westmere Intel hosts (can be disabled with emulate_invalid_guest_state=0) - relatively small ppc and s390 updates - PCID/INVPCID support in guests - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on interrupt intensive workloads) - Lockless write faults during live migration - EPT accessed/dirty bits support for new Intel processors" Fix up conflicts in: - Documentation/virtual/kvm/api.txt: Stupid subchapter numbering, added next to each other. - arch/powerpc/kvm/booke_interrupts.S: PPC asm changes clashing with the KVM fixes - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c: Duplicated commits through the kvm tree and the s390 tree, with subsequent edits in the KVM tree. * tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) KVM: fix race with level interrupts x86, hyper: fix build with !CONFIG_KVM_GUEST Revert "apic: fix kvm build on UP without IOAPIC" KVM guest: switch to apic_set_eoi_write, apic_write apic: add apic_set_eoi_write for PV use KVM: VMX: Implement PCID/INVPCID for guests with EPT KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check KVM: PPC: Critical interrupt emulation support KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests KVM: PPC64: booke: Set interrupt computation mode for 64-bit host KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt KVM: PPC: bookehv64: Add support for std/ld emulation. booke: Added crit/mc exception handler for e500v2 booke/bookehv: Add host crit-watchdog exception support KVM: MMU: document mmu-lock and fast page fault KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint KVM: MMU: trace fast page fault KVM: MMU: fast path of handling guest page fault KVM: MMU: introduce SPTE_MMU_WRITEABLE bit KVM: MMU: fold tlb flush judgement into mmu_spte_update ...
2012-07-23KVM: Choose better candidate for directed yieldRaghavendra K T
Currently, on a large vcpu guests, there is a high probability of yielding to the same vcpu who had recently done a pause-loop exit or cpu relax intercepted. Such a yield can lead to the vcpu spinning again and hence degrade the performance. The patchset keeps track of the pause loop exit/cpu relax interception and gives chance to a vcpu which: (a) Has not done pause loop exit or cpu relax intercepted at all (probably he is preempted lock-holder) (b) Was skipped in last iteration because it did pause loop exit or cpu relax intercepted, and probably has become eligible now (next eligible lock holder) Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # on s390x Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-23KVM: Note down when cpu relax intercepted or pause loop exitedRaghavendra K T
Noting pause loop exited vcpu or cpu relax intercepted helps in filtering right candidate to yield. Wrong selection of vcpu; i.e., a vcpu that just did a pl-exit or cpu relax intercepted may contribute to performance degradation. Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # on s390x Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-23KVM: Add config to support ple or cpu relax optimzationRaghavendra K T
Suggested-by: Avi Kivity <avi@redhat.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # on s390x Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-20KVM: switch to symbolic name for irq_states sizeMichael S. Tsirkin
Use PIC_NUM_PINS instead of hard-coded 16 for pic pins. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: fix race with level interruptsMichael S. Tsirkin
When more than 1 source id is in use for the same GSI, we have the following race related to handling irq_states race: CPU 0 clears bit 0. CPU 0 read irq_state as 0. CPU 1 sets level to 1. CPU 1 calls kvm_ioapic_set_irq(1). CPU 0 calls kvm_ioapic_set_irq(0). Now ioapic thinks the level is 0 but irq_state is not 0. Fix by performing all irq_states bitmap handling under pic/ioapic lock. This also removes the need for atomics with irq_states handling. Reported-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19KVM: remove the unused parameter of gfn_to_pfn_memslotXiao Guangrong
The parameter, 'kvm', is not used in gfn_to_pfn_memslot, we can happily remove it Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19KVM: make bad_pfn static to kvm_main.cXiao Guangrong
bad_pfn is not used out of kvm_main.c, so mark it static, also move it near hwpoison_pfn and fault_pfn Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19KVM: using get_fault_pfn to get the fault pfnXiao Guangrong
Using get_fault_pfn to cleanup the code Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-18KVM: Introduce kvm_unmap_hva_range() for ↵Takuya Yoshikawa
kvm_mmu_notifier_invalidate_range_start() When we tested KVM under memory pressure, with THP enabled on the host, we noticed that MMU notifier took a long time to invalidate huge pages. Since the invalidation was done with mmu_lock held, it not only wasted the CPU but also made the host harder to respond. This patch mitigates this by using kvm_handle_hva_range(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Alexander Graf <agraf@suse.de> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-11KVM: Fix device assignment threaded irq handlerAlex Williamson
The kernel no longer allows us to pass NULL for the hard handler without also specifying IRQF_ONESHOT. IRQF_ONESHOT imposes latency in the exit path that we don't need for MSI interrupts. Long term we'd like to inject these interrupts from the hard handler when possible. In the short term, we can create dummy hard handlers that return us to the previous behavior. Credit to Michael for original patch. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=43328 Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-06KVM: handle last_boosted_vcpu = 0 caseRik van Riel
If last_boosted_vcpu == 0, then we fall through all test cases and may end up with all VCPUs pouncing on vcpu 0. With a large enough guest, this can result in enormous runqueue lock contention, which can prevent vcpu0 from running, leading to a livelock. Changing < to <= makes sure we properly handle that case. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-03KVM: fix fault page leakXiao Guangrong
fault_page is forgot to be freed Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-02KVM: Sanitize KVM_IRQFD flagsAlex Williamson
We only know of one so far. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-02KVM: Pass kvm_irqfd to functionsAlex Williamson
Prune this down to just the struct kvm_irqfd so we can avoid changing function definition for every flag or field we use. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-06-18KVM: use KVM_CAP_IRQ_ROUTING to protect the routing related codeMarc Zyngier
The KVM code sometimes uses CONFIG_HAVE_KVM_IRQCHIP to protect code that is related to IRQ routing, which not all in-kernel irqchips may support. Use KVM_CAP_IRQ_ROUTING instead. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-15KVM: Fix PCI header check on device assignmentJan Kiszka
The masking was wrong (must have been 0x7f), and there is no need to re-read the value as pci_setup_device already does this for us. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43339 Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-06-05KVM: Fix buffer overflow in kvm_set_irq()Avi Kivity
kvm_set_irq() has an internal buffer of three irq routing entries, allowing connecting a GSI to three IRQ chips or on MSI. However setup_routing_entry() does not properly enforce this, allowing three irqchip routes followed by an MSI route to overflow the buffer. Fix by ensuring that an MSI entry is added to an empty list. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-05KVM: Avoid wasting pages for small lpage_info arraysTakuya Yoshikawa
lpage_info is created for each large level even when the memory slot is not for RAM. This means that when we add one slot for a PCI device, we end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc(). To make things worse, there is an increasing number of devices which would result in more pages being wasted this way. This patch mitigates this problem by using kvm_kvzalloc(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-05KVM: Separate out dirty_bitmap allocation code as kvm_kvzalloc()Takuya Yoshikawa
Will be used for lpage_info allocation later. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-30KVM: s390: Implement the directed yield (diag 9c) hypervisor call for KVMKonstantin Weitz
This patch implements the directed yield hypercall found on other System z hypervisors. It delegates execution time to the virtual cpu specified in the instruction's parameter. Useful to avoid long spinlock waits in the guest. Christian Borntraeger: moved common code in virt/kvm/ Signed-off-by: Konstantin Weitz <WEITZKON@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-04-24KVM: Introduce direct MSI message injection for in-kernel irqchipsJan Kiszka
Currently, MSI messages can only be injected to in-kernel irqchips by defining a corresponding IRQ route for each message. This is not only unhandy if the MSI messages are generated "on the fly" by user space, IRQ routes are a limited resource that user space has to manage carefully. By providing a direct injection path, we can both avoid using up limited resources and simplify the necessary steps for user land. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-19Merge branch 'linus' into queueMarcelo Tosatti
Merge reason: development work has dependency on kvm patches merged upstream. Conflicts: Documentation/feature-removal-schedule.txt Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-04-19KVM: lock slots_lock around device assignmentAlex Williamson
As pointed out by Jason Baron, when assigning a device to a guest we first set the iommu domain pointer, which enables mapping and unmapping of memory slots to the iommu. This leaves a window where this path is enabled, but we haven't synchronized the iommu mappings to the existing memory slots. Thus a slot being removed at that point could send us down unexpected code paths removing non-existent pinnings and iommu mappings. Take the slots_lock around creating the iommu domain and initial mappings as well as around iommu teardown to avoid this race. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-04-16KVM: dont clear TMR on EOIMichael S. Tsirkin
Intel spec says that TMR needs to be set/cleared when IRR is set, but kvm also clears it on EOI. I did some tests on a real (AMD based) system, and I see same TMR values both before and after EOI, so I think it's a minor bug in kvm. This patch fixes TMR to be set/cleared on IRR set only as per spec. And now that we don't clear TMR, we can save an atomic read of TMR on EOI that's not propagated to ioapic, by checking whether ioapic needs a specific vector first and calculating the mode afterwards. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-04-11KVM: unmap pages from the iommu when slots are removedAlex Williamson
We've been adding new mappings, but not destroying old mappings. This can lead to a page leak as pages are pinned using get_user_pages, but only unpinned with put_page if they still exist in the memslots list on vm shutdown. A memslot that is destroyed while an iommu domain is enabled for the guest will therefore result in an elevated page reference count that is never cleared. Additionally, without this fix, the iommu is only programmed with the first translation for a gpa. This can result in peer-to-peer errors if a mapping is destroyed and replaced by a new mapping at the same gpa as the iommu will still be pointing to the original, pinned memory address. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-04-08KVM: Remove unused dirty_bitmap_head and nr_dirty_pagesTakuya Yoshikawa
Now that we do neither double buffering nor heuristic selection of the write protection method these are not needed anymore. Note: some drivers have their own implementation of set_bit_le() and making it generic needs a bit of work; so we use test_and_set_bit_le() and will later replace it with generic set_bit_le(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08KVM: fix kvm_vcpu_kick build failure on S390Marcelo Tosatti
S390's kvm_vcpu_stat does not contain halt_wakeup member. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08KVM: Factor out kvm_vcpu_kick to arch-generic codeChristoffer Dall
The kvm_vcpu_kick function performs roughly the same funcitonality on most all architectures, so we shouldn't have separate copies. PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch structure and to accomodate this special need a __KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function kvm_arch_vcpu_wq have been defined. For all other architectures this is a generic inline that just returns &vcpu->wq; Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08KVM: resize kvm_io_range array dynamicallyAmos Kong
This patch makes the kvm_io_range array can be resized dynamically. Signed-off-by: Amos Kong <akong@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-20KVM: Convert intx_mask_lock to spin lockJan Kiszka
As kvm_notify_acked_irq calls kvm_assigned_dev_ack_irq under rcu_read_lock, we cannot use a mutex in the latter function. Switch to a spin lock to address this. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: use correct tlbs dirty type in cmpxchgAlex Shi
Using 'int' type is not suitable for a 'long' object. So, correct it. Signed-off-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Allow host IRQ sharing for assigned PCI 2.3 devicesJan Kiszka
PCI 2.3 allows to generically disable IRQ sources at device level. This enables us to share legacy IRQs of such devices with other host devices when passing them to a guest. The new IRQ sharing feature introduced here is optional, user space has to request it explicitly. Moreover, user space can inform us about its view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the interrupt and signaling it if the guest masked it via the virtualized PCI config space. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Ensure all vcpus are consistent with in-kernel irqchip settingsAvi Kivity
If some vcpus are created before KVM_CREATE_IRQCHIP, then irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading to potential NULL pointer dereferences. Fix by: - ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called - ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP This is somewhat long winded because vcpu->arch.apic is created without kvm->lock held. Based on earlier patch by Michael Ellerman. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: mmu_notifier: Flush TLBs before releasing mmu_lockTakuya Yoshikawa
Other threads may process the same page in that small window and skip TLB flush and then return before these functions do flush. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Introduce kvm_memory_slot::arch and move lpage_info into itTakuya Yoshikawa
Some members of kvm_memory_slot are not used by every architecture. This patch is the first step to make this difference clear by introducing kvm_memory_slot::arch; lpage_info is moved into it. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Simplify ifndef conditional usage in __kvm_set_memory_region()Takuya Yoshikawa
Narrow down the controlled text inside the conditional so that it will include lpage_info and rmap stuff only. For this we change the way we check whether the slot is being created from "if (npages && !new.rmap)" to "if (npages && !old.npages)". We also stop checking if lpage_info is NULL when we create lpage_info because we do it from inside the slot creation code block. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Split lpage_info creation out from __kvm_set_memory_region()Takuya Yoshikawa
This makes it easy to make lpage_info architecture specific. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Introduce gfn_to_index() which returns the index for a given levelTakuya Yoshikawa
This patch cleans up the code and removes the "(void)level;" warning suppressor. Note that we can also use this for PT_PAGE_TABLE_LEVEL to treat every level uniformly later. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05KVM: Move gfn_to_memslot() to kvm_host.hPaul Mackerras
This moves __gfn_to_memslot() and search_memslots() from kvm_main.c to kvm_host.h to reduce the code duplication caused by the need for non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c to call gfn_to_memslot() in real mode. Rather than putting gfn_to_memslot() itself in a header, which would lead to increased code size, this puts __gfn_to_memslot() in a header. Then, the non-modular uses of gfn_to_memslot() are changed to call __gfn_to_memslot() instead. This way there is only one place in the source code that needs to be changed should the gfn_to_memslot() implementation need to be modified. On powerpc, the Book3S HV style of KVM has code that is called from real mode which needs to call gfn_to_memslot() and thus needs this. (Module code is allocated in the vmalloc region, which can't be accessed in real mode.) With this, we can remove builtin_gfn_to_memslot() from book3s_hv_rm_mmu.c. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05KVM: fix error handling for out of range irqMichael S. Tsirkin
find_index_from_host_irq returns 0 on error but callers assume < 0 on error. This should not matter much: an out of range irq should never happen since irq handler was registered with this irq #, and even if it does we get a spurious msix irq in guest and typically nothing terrible happens. Still, better to make it consistent. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05KVM: Add barriers to allow mmu_notifier_retry to be used locklesslyPaul Mackerras
This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give the correct answer when called without kvm->mmu_lock being held. PowerPC Book3S HV KVM wants to use a bitlock per guest page rather than a single global spinlock in order to improve the scalability of updates to the guest MMU hashed page table, and so needs this. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05KVM: s390: ucontrol: export SIE control block to userCarsten Otte
This patch exports the s390 SIE hardware control block to userspace via the mapping of the vcpu file descriptor. In order to do so, a new arch callback named kvm_arch_vcpu_fault is introduced for all architectures. It allows to map architecture specific pages. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05KVM: s390: add parameter for KVM_CREATE_VMCarsten Otte
This patch introduces a new config option for user controlled kernel virtual machines. It introduces a parameter to KVM_CREATE_VM that allows to set bits that alter the capabilities of the newly created virtual machine. The parameter is passed to kvm_arch_init_vm for all architectures. The only valid modifier bit for now is KVM_VM_S390_UCONTROL. This requires CAP_SYS_ADMIN privileges and creates a user controlled virtual machine on s390 architectures. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-02-01KVM: Fix __set_bit() race in mark_page_dirty() during dirty loggingTakuya Yoshikawa
It is possible that the __set_bit() in mark_page_dirty() is called simultaneously on the same region of memory, which may result in only one bit being set, because some callers do not take mmu_lock before mark_page_dirty(). This problem is hard to produce because when we reach mark_page_dirty() beginning from, e.g., tdp_page_fault(), mmu_lock is being held during __direct_map(): making kvm-unit-tests' dirty log api test write to two pages concurrently was not useful for this reason. So we have confirmed that there can actually be race condition by checking if some callers really reach there without holding mmu_lock using spin_is_locked(): probably they were from kvm_write_guest_page(). To fix this race, this patch changes the bit operation to the atomic version: note that nr_dirty_pages also suffers from the race but we do not need exactly correct numbers for now. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-01-13module_param: make bool parameters really bool (drivers & misc)Rusty Russell
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-10Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (53 commits) iommu/amd: Set IOTLB invalidation timeout iommu/amd: Init stats for iommu=pt iommu/amd: Remove unnecessary cache flushes in amd_iommu_resume iommu/amd: Add invalidate-context call-back iommu/amd: Add amd_iommu_device_info() function iommu/amd: Adapt IOMMU driver to PCI register name changes iommu/amd: Add invalid_ppr callback iommu/amd: Implement notifiers for IOMMUv2 iommu/amd: Implement IO page-fault handler iommu/amd: Add routines to bind/unbind a pasid iommu/amd: Implement device aquisition code for IOMMUv2 iommu/amd: Add driver stub for AMD IOMMUv2 support iommu/amd: Add stat counter for IOMMUv2 events iommu/amd: Add device errata handling iommu/amd: Add function to get IOMMUv2 domain for pdev iommu/amd: Implement function to send PPR completions iommu/amd: Implement functions to manage GCR3 table iommu/amd: Implement IOMMUv2 TLB flushing routines iommu/amd: Add support for IOMMUv2 domain mode iommu/amd: Add amd_iommu_domain_direct_map function ...