Skip to content

Commit 11e8c7e

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini: "Quite a large pull request, partly due to skipping last week and therefore having material from ~all submaintainers in this one. About a fourth of it is a new selftest, and a couple more changes are large in number of files touched (fixing a -Wflex-array-member-not-at-end compiler warning) or lines changed (reformatting of a table in the API documentation, thanks rST). But who am I kidding---it's a lot of commits and there are a lot of bugs being fixed here, some of them on the nastier side like the RISC-V ones. ARM: - Correctly handle deactivation of interrupts that were activated from LRs. Since EOIcount only denotes deactivation of interrupts that are not present in an LR, start EOIcount deactivation walk *after* the last irq that made it into an LR - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM is already enabled -- not only thhis isn't possible (pKVM will reject the call), but it is also useless: this can only happen for a CPU that has already booted once, and the capability will not change - Fix a couple of low-severity bugs in our S2 fault handling path, affecting the recently introduced LS64 handling and the even more esoteric handling of hwpoison in a nested context - Address yet another syzkaller finding in the vgic initialisation, where we would end-up destroying an uninitialised vgic with nasty consequences - Address an annoying case of pKVM failing to boot when some of the memblock regions that the host is faulting in are not page-aligned - Inject some sanity in the NV stage-2 walker by checking the limits against the advertised PA size, and correctly report the resulting faults PPC: - Fix a PPC e500 build error due to a long-standing wart that was exposed by the recent conversion to kmalloc_obj(); rip out all the ugliness that led to the wart RISC-V: - Prevent speculative out-of-bounds access using array_index_nospec() in APLIC interrupt handling, ONE_REG regiser access, AIA CSR access, float register access, and PMU counter access - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(), kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr() - Fix potential null pointer dereference in kvm_riscv_vcpu_aia_rmw_topei() - Fix off-by-one array access in SBI PMU - Skip THP support check during dirty logging - Fix error code returned for Smstateen and Ssaia ONE_REG interface - Check host Ssaia extension when creating AIA irqchip x86: - Fix cases where CPUID mitigation features were incorrectly marked as available whenever the kernel used scattered feature words for them - Validate _all_ GVAs, rather than just the first GVA, when processing a range of GVAs for Hyper-V's TLB flush hypercalls - Fix a brown paper bug in add_atomic_switch_msr() - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list, to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel local APIC (and AVIC is enabled at the module level) - Update CR8 write interception when AVIC is (de)activated, to fix a bug where the guest can run in perpetuity with the CR8 intercept enabled - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by default) an unintentional tightening of userspace ABI in 6.17, and provides some amount of backwards compatibility with hypervisors who want to freeze PMCs on VM-Entry - Validate the VMCS/VMCB on return to a nested guest from SMM, because either userspace or the guest could stash invalid values in memory and trigger the processor's consistency checks Generic: - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive Selftests: - Increase the maximum number of NUMA nodes in the guest_memfd selftest to 64 (from 8)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8 Documentation: kvm: fix formatting of the quirks table KVM: x86: clarify leave_smm() return value selftests: kvm: add a test that VMX validates controls on RSM selftests: kvm: extract common functionality out of smm_test.c KVM: SVM: check validity of VMCB controls when returning from SMM KVM: VMX: check validity of VMCS controls when returning from SMM KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers() KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr() KVM: x86: hyper-v: Validate all GVAs during PV TLB flush KVM: x86: synthesize CPUID bits only if CPU capability is set KVM: PPC: e500: Rip out "struct tlbe_ref" KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type KVM: selftests: Increase 'maxnode' for guest_memfd tests KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2() ...
2 parents 4f3df2e + d2ea4ff commit 11e8c7e

57 files changed

Lines changed: 758 additions & 388 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/virt/kvm/api.rst

Lines changed: 117 additions & 109 deletions
Large diffs are not rendered by default.

Documentation/virt/kvm/locking.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ The acquisition orders for mutexes are as follows:
1717

1818
- kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
1919

20+
- vcpu->mutex is taken outside kvm->slots_lock and kvm->slots_arch_lock
21+
2022
- kvm->slots_lock is taken outside kvm->irq_lock, though acquiring
2123
them together is quite rare.
2224

arch/arm64/include/asm/kvm_host.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -784,6 +784,9 @@ struct kvm_host_data {
784784
/* Number of debug breakpoints/watchpoints for this CPU (minus 1) */
785785
unsigned int debug_brps;
786786
unsigned int debug_wrps;
787+
788+
/* Last vgic_irq part of the AP list recorded in an LR */
789+
struct vgic_irq *last_lr_irq;
787790
};
788791

789792
struct kvm_host_psci_config {

arch/arm64/kernel/cpufeature.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2345,6 +2345,15 @@ static bool can_trap_icv_dir_el1(const struct arm64_cpu_capabilities *entry,
23452345
!is_midr_in_range_list(has_vgic_v3))
23462346
return false;
23472347

2348+
/*
2349+
* pKVM prevents late onlining of CPUs. This means that whatever
2350+
* state the capability is in after deprivilege cannot be affected
2351+
* by a new CPU booting -- this is garanteed to be a CPU we have
2352+
* already seen, and the cap is therefore unchanged.
2353+
*/
2354+
if (system_capabilities_finalized() && is_protected_kvm_enabled())
2355+
return cpus_have_final_cap(ARM64_HAS_ICH_HCR_EL2_TDIR);
2356+
23482357
if (is_kernel_in_hyp_mode())
23492358
res.a1 = read_sysreg_s(SYS_ICH_VTR_EL2);
23502359
else

arch/arm64/kvm/at.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1504,8 +1504,6 @@ int __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
15041504
fail = true;
15051505
}
15061506

1507-
isb();
1508-
15091507
if (!fail)
15101508
par = read_sysreg_par();
15111509

arch/arm64/kvm/guest.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929

3030
#include "trace.h"
3131

32-
const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
32+
const struct kvm_stats_desc kvm_vm_stats_desc[] = {
3333
KVM_GENERIC_VM_STATS()
3434
};
3535

@@ -42,7 +42,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
4242
sizeof(kvm_vm_stats_desc),
4343
};
4444

45-
const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
45+
const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
4646
KVM_GENERIC_VCPU_STATS(),
4747
STATS_DESC_COUNTER(VCPU, hvc_exit_stat),
4848
STATS_DESC_COUNTER(VCPU, wfe_exit_stat),

arch/arm64/kvm/hyp/nvhe/mem_protect.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -518,7 +518,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
518518
granule = kvm_granule_size(level);
519519
cur.start = ALIGN_DOWN(addr, granule);
520520
cur.end = cur.start + granule;
521-
if (!range_included(&cur, range))
521+
if (!range_included(&cur, range) && level < KVM_PGTABLE_LAST_LEVEL)
522522
continue;
523523
*range = cur;
524524
return 0;

arch/arm64/kvm/mmu.c

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1751,6 +1751,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
17511751

17521752
force_pte = (max_map_size == PAGE_SIZE);
17531753
vma_pagesize = min_t(long, vma_pagesize, max_map_size);
1754+
vma_shift = __ffs(vma_pagesize);
17541755
}
17551756

17561757
/*
@@ -1837,10 +1838,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
18371838
if (exec_fault && s2_force_noncacheable)
18381839
ret = -ENOEXEC;
18391840

1840-
if (ret) {
1841-
kvm_release_page_unused(page);
1842-
return ret;
1843-
}
1841+
if (ret)
1842+
goto out_put_page;
18441843

18451844
/*
18461845
* Guest performs atomic/exclusive operations on memory with unsupported
@@ -1850,7 +1849,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
18501849
*/
18511850
if (esr_fsc_is_excl_atomic_fault(kvm_vcpu_get_esr(vcpu))) {
18521851
kvm_inject_dabt_excl_atomic(vcpu, kvm_vcpu_get_hfar(vcpu));
1853-
return 1;
1852+
ret = 1;
1853+
goto out_put_page;
18541854
}
18551855

18561856
if (nested)
@@ -1936,6 +1936,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
19361936
mark_page_dirty_in_slot(kvm, memslot, gfn);
19371937

19381938
return ret != -EAGAIN ? ret : 0;
1939+
1940+
out_put_page:
1941+
kvm_release_page_unused(page);
1942+
return ret;
19391943
}
19401944

19411945
/* Resolve the access fault by making the page young again. */

arch/arm64/kvm/nested.c

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -152,31 +152,31 @@ static int get_ia_size(struct s2_walk_info *wi)
152152
return 64 - wi->t0sz;
153153
}
154154

155-
static int check_base_s2_limits(struct s2_walk_info *wi,
155+
static int check_base_s2_limits(struct kvm_vcpu *vcpu, struct s2_walk_info *wi,
156156
int level, int input_size, int stride)
157157
{
158-
int start_size, ia_size;
158+
int start_size, pa_max;
159159

160-
ia_size = get_ia_size(wi);
160+
pa_max = kvm_get_pa_bits(vcpu->kvm);
161161

162162
/* Check translation limits */
163163
switch (BIT(wi->pgshift)) {
164164
case SZ_64K:
165-
if (level == 0 || (level == 1 && ia_size <= 42))
165+
if (level == 0 || (level == 1 && pa_max <= 42))
166166
return -EFAULT;
167167
break;
168168
case SZ_16K:
169-
if (level == 0 || (level == 1 && ia_size <= 40))
169+
if (level == 0 || (level == 1 && pa_max <= 40))
170170
return -EFAULT;
171171
break;
172172
case SZ_4K:
173-
if (level < 0 || (level == 0 && ia_size <= 42))
173+
if (level < 0 || (level == 0 && pa_max <= 42))
174174
return -EFAULT;
175175
break;
176176
}
177177

178178
/* Check input size limits */
179-
if (input_size > ia_size)
179+
if (input_size > pa_max)
180180
return -EFAULT;
181181

182182
/* Check number of entries in starting level table */
@@ -269,16 +269,19 @@ static int walk_nested_s2_pgd(struct kvm_vcpu *vcpu, phys_addr_t ipa,
269269
if (input_size > 48 || input_size < 25)
270270
return -EFAULT;
271271

272-
ret = check_base_s2_limits(wi, level, input_size, stride);
273-
if (WARN_ON(ret))
272+
ret = check_base_s2_limits(vcpu, wi, level, input_size, stride);
273+
if (WARN_ON(ret)) {
274+
out->esr = compute_fsc(0, ESR_ELx_FSC_FAULT);
274275
return ret;
276+
}
275277

276278
base_lower_bound = 3 + input_size - ((3 - level) * stride +
277279
wi->pgshift);
278280
base_addr = wi->baddr & GENMASK_ULL(47, base_lower_bound);
279281

280282
if (check_output_size(wi, base_addr)) {
281-
out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
283+
/* R_BFHQH */
284+
out->esr = compute_fsc(0, ESR_ELx_FSC_ADDRSZ);
282285
return 1;
283286
}
284287

@@ -293,8 +296,10 @@ static int walk_nested_s2_pgd(struct kvm_vcpu *vcpu, phys_addr_t ipa,
293296

294297
paddr = base_addr | index;
295298
ret = read_guest_s2_desc(vcpu, paddr, &desc, wi);
296-
if (ret < 0)
299+
if (ret < 0) {
300+
out->esr = ESR_ELx_FSC_SEA_TTW(level);
297301
return ret;
302+
}
298303

299304
new_desc = desc;
300305

arch/arm64/kvm/vgic/vgic-init.c

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,21 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
143143
kvm->arch.vgic.in_kernel = true;
144144
kvm->arch.vgic.vgic_model = type;
145145
kvm->arch.vgic.implementation_rev = KVM_VGIC_IMP_REV_LATEST;
146+
kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
147+
148+
aa64pfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1) & ~ID_AA64PFR0_EL1_GIC;
149+
pfr1 = kvm_read_vm_id_reg(kvm, SYS_ID_PFR1_EL1) & ~ID_PFR1_EL1_GIC;
150+
151+
if (type == KVM_DEV_TYPE_ARM_VGIC_V2) {
152+
kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
153+
} else {
154+
INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
155+
aa64pfr0 |= SYS_FIELD_PREP_ENUM(ID_AA64PFR0_EL1, GIC, IMP);
156+
pfr1 |= SYS_FIELD_PREP_ENUM(ID_PFR1_EL1, GIC, GICv3);
157+
}
158+
159+
kvm_set_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1, aa64pfr0);
160+
kvm_set_vm_id_reg(kvm, SYS_ID_PFR1_EL1, pfr1);
146161

147162
kvm_for_each_vcpu(i, vcpu, kvm) {
148163
ret = vgic_allocate_private_irqs_locked(vcpu, type);
@@ -157,25 +172,10 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
157172
vgic_cpu->private_irqs = NULL;
158173
}
159174

175+
kvm->arch.vgic.vgic_model = 0;
160176
goto out_unlock;
161177
}
162178

163-
kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
164-
165-
aa64pfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1) & ~ID_AA64PFR0_EL1_GIC;
166-
pfr1 = kvm_read_vm_id_reg(kvm, SYS_ID_PFR1_EL1) & ~ID_PFR1_EL1_GIC;
167-
168-
if (type == KVM_DEV_TYPE_ARM_VGIC_V2) {
169-
kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
170-
} else {
171-
INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
172-
aa64pfr0 |= SYS_FIELD_PREP_ENUM(ID_AA64PFR0_EL1, GIC, IMP);
173-
pfr1 |= SYS_FIELD_PREP_ENUM(ID_PFR1_EL1, GIC, GICv3);
174-
}
175-
176-
kvm_set_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1, aa64pfr0);
177-
kvm_set_vm_id_reg(kvm, SYS_ID_PFR1_EL1, pfr1);
178-
179179
if (type == KVM_DEV_TYPE_ARM_VGIC_V3)
180180
kvm->arch.vgic.nassgicap = system_supports_direct_sgis();
181181

0 commit comments

Comments
 (0)