* [PATCH 0/4] ARM: mmu: ensure PTE is written at once
@ 2025-05-21 17:41 Ahmad Fatoum
2025-05-21 17:41 ` [PATCH 1/4] arch: sync READ_ONCE/WRITE_ONCE with Linux Ahmad Fatoum
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Ahmad Fatoum @ 2025-05-21 17:41 UTC (permalink / raw)
To: barebox
Writing the PTE with a normal non-volatile access is a bad idea, because
speculation may already use a partially written entry.
Use WRITE_ONCE to turn this into a volatile write via a new set_pte
helper. The extra helper is useful for type enforcement and for future
extension when we start to consistently apply break-before-make[1].
Ahmad Fatoum (4):
arch: sync READ_ONCE/WRITE_ONCE with Linux
ARM64: mmu: document granule size calculation
ARM: mmu: ensure PTE is written at once
WIP: break before make and don't invalidate uncached regions being
remapped
Documentation/devel/porting.rst | 2 +
arch/arm/cpu/mmu_32.c | 247 +++++++++++++++++++++++-----
arch/arm/cpu/mmu_64.c | 38 ++++-
arch/arm/include/asm/barrier.h | 8 +
arch/kvx/include/asm/barrier.h | 2 +
arch/mips/include/asm/barrier.h | 8 +
arch/openrisc/include/asm/barrier.h | 8 +
arch/powerpc/include/asm/barrier.h | 8 +
arch/sandbox/include/asm/barrier.h | 8 +
arch/x86/include/asm/barrier.h | 8 +
include/asm-generic/barrier.h | 24 +++
include/asm-generic/rwonce.h | 87 ++++++++++
include/linux/bitops.h | 2 +-
include/linux/compiler.h | 94 -----------
include/linux/compiler_types.h | 46 ++++++
include/linux/list.h | 1 +
16 files changed, 449 insertions(+), 142 deletions(-)
create mode 100644 arch/arm/include/asm/barrier.h
create mode 100644 arch/mips/include/asm/barrier.h
create mode 100644 arch/openrisc/include/asm/barrier.h
create mode 100644 arch/powerpc/include/asm/barrier.h
create mode 100644 arch/sandbox/include/asm/barrier.h
create mode 100644 arch/x86/include/asm/barrier.h
create mode 100644 include/asm-generic/barrier.h
create mode 100644 include/asm-generic/rwonce.h
--
2.39.5
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/4] arch: sync READ_ONCE/WRITE_ONCE with Linux
2025-05-21 17:41 [PATCH 0/4] ARM: mmu: ensure PTE is written at once Ahmad Fatoum
@ 2025-05-21 17:41 ` Ahmad Fatoum
2025-05-21 17:41 ` [PATCH 2/4] ARM64: mmu: document granule size calculation Ahmad Fatoum
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Ahmad Fatoum @ 2025-05-21 17:41 UTC (permalink / raw)
To: barebox; +Cc: Ahmad Fatoum
In preparation for using READ_ONCE/WRITE_ONCE in the MMU code, let's
give it a sync with Linux.
This removes ACCESS_ONCE, but we have only a single user in barebox
anyway outside of scripts/.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
Documentation/devel/porting.rst | 2 +
arch/arm/include/asm/barrier.h | 8 +++
arch/kvx/include/asm/barrier.h | 2 +
arch/mips/include/asm/barrier.h | 8 +++
arch/openrisc/include/asm/barrier.h | 8 +++
arch/powerpc/include/asm/barrier.h | 8 +++
arch/sandbox/include/asm/barrier.h | 8 +++
arch/x86/include/asm/barrier.h | 8 +++
include/asm-generic/barrier.h | 24 ++++++++
include/asm-generic/rwonce.h | 87 ++++++++++++++++++++++++++
include/linux/bitops.h | 2 +-
include/linux/compiler.h | 94 -----------------------------
include/linux/compiler_types.h | 46 ++++++++++++++
include/linux/list.h | 1 +
14 files changed, 211 insertions(+), 95 deletions(-)
create mode 100644 arch/arm/include/asm/barrier.h
create mode 100644 arch/mips/include/asm/barrier.h
create mode 100644 arch/openrisc/include/asm/barrier.h
create mode 100644 arch/powerpc/include/asm/barrier.h
create mode 100644 arch/sandbox/include/asm/barrier.h
create mode 100644 arch/x86/include/asm/barrier.h
create mode 100644 include/asm-generic/barrier.h
create mode 100644 include/asm-generic/rwonce.h
diff --git a/Documentation/devel/porting.rst b/Documentation/devel/porting.rst
index e3904eea067c..9dab2a301f2a 100644
--- a/Documentation/devel/porting.rst
+++ b/Documentation/devel/porting.rst
@@ -488,6 +488,8 @@ Your architecture needs to implement following headers:
Only if ``HAS_DMA`` is selected by the architecture.
- ``<asm/io.h>``
Defines I/O memory and port accessors
+ - ``<asm/barrier.h>``
+ Can normally just include ``<asm-generic/barrier.h>``
- ``<asm/mmu.h>``
- ``<asm/string.h>``
- ``<asm/swab.h>``
diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
new file mode 100644
index 000000000000..5eb06065cd9f
--- /dev/null
+++ b/arch/arm/include/asm/barrier.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_BARRIER_H
+#define _ASM_BARRIER_H
+
+#include <asm-generic/barrier.h>
+
+#endif
diff --git a/arch/kvx/include/asm/barrier.h b/arch/kvx/include/asm/barrier.h
index 616b5f90a230..d08b4a98a62c 100644
--- a/arch/kvx/include/asm/barrier.h
+++ b/arch/kvx/include/asm/barrier.h
@@ -6,6 +6,8 @@
#ifndef _ASM_KVX_BARRIER_H
#define _ASM_KVX_BARRIER_H
+#include <asm-generic/barrier.h>
+
/* fence is sufficient to guarantee write ordering */
#define wmb() __builtin_kvx_fence()
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
new file mode 100644
index 000000000000..5eb06065cd9f
--- /dev/null
+++ b/arch/mips/include/asm/barrier.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_BARRIER_H
+#define _ASM_BARRIER_H
+
+#include <asm-generic/barrier.h>
+
+#endif
diff --git a/arch/openrisc/include/asm/barrier.h b/arch/openrisc/include/asm/barrier.h
new file mode 100644
index 000000000000..5eb06065cd9f
--- /dev/null
+++ b/arch/openrisc/include/asm/barrier.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_BARRIER_H
+#define _ASM_BARRIER_H
+
+#include <asm-generic/barrier.h>
+
+#endif
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
new file mode 100644
index 000000000000..5eb06065cd9f
--- /dev/null
+++ b/arch/powerpc/include/asm/barrier.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_BARRIER_H
+#define _ASM_BARRIER_H
+
+#include <asm-generic/barrier.h>
+
+#endif
diff --git a/arch/sandbox/include/asm/barrier.h b/arch/sandbox/include/asm/barrier.h
new file mode 100644
index 000000000000..5eb06065cd9f
--- /dev/null
+++ b/arch/sandbox/include/asm/barrier.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_BARRIER_H
+#define _ASM_BARRIER_H
+
+#include <asm-generic/barrier.h>
+
+#endif
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
new file mode 100644
index 000000000000..5eb06065cd9f
--- /dev/null
+++ b/arch/x86/include/asm/barrier.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_BARRIER_H
+#define _ASM_BARRIER_H
+
+#include <asm-generic/barrier.h>
+
+#endif
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
new file mode 100644
index 000000000000..fbfb67f3087f
--- /dev/null
+++ b/include/asm-generic/barrier.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Generic barrier definitions.
+ *
+ * It should be possible to use these on really simple architectures,
+ * but it serves more as a starting point for new ports.
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+#ifndef __ASM_GENERIC_BARRIER_H
+#define __ASM_GENERIC_BARRIER_H
+
+#ifndef __ASSEMBLY__
+
+#include <linux/compiler.h>
+#include <asm-generic/rwonce.h>
+
+#ifndef nop
+#define nop() asm volatile ("nop")
+#endif
+
+#endif /* !__ASSEMBLY__ */
+#endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/include/asm-generic/rwonce.h b/include/asm-generic/rwonce.h
new file mode 100644
index 000000000000..53aad037902c
--- /dev/null
+++ b/include/asm-generic/rwonce.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Prevent the compiler from merging or refetching reads or writes. The
+ * compiler is also forbidden from reordering successive instances of
+ * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
+ * particular ordering. One way to make the compiler aware of ordering is to
+ * put the two invocations of READ_ONCE or WRITE_ONCE in different C
+ * statements.
+ *
+ * These two macros will also work on aggregate data types like structs or
+ * unions.
+ *
+ * Their two major use cases are: (1) Mediating communication between
+ * process-level code and irq/NMI handlers, all running on the same CPU,
+ * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
+ * mutilate accesses that either do not require ordering or that interact
+ * with an explicit memory barrier or atomic instruction that provides the
+ * required ordering.
+ */
+#ifndef __ASM_GENERIC_RWONCE_H
+#define __ASM_GENERIC_RWONCE_H
+
+#ifndef __ASSEMBLY__
+
+#include <linux/compiler_types.h>
+
+/*
+ * Yes, this permits 64-bit accesses on 32-bit architectures. These will
+ * actually be atomic in some cases (namely Armv7 + LPAE), but for others we
+ * rely on the access being split into 2x32-bit accesses for a 32-bit quantity
+ * (e.g. a virtual address) and a strong prevailing wind.
+ */
+#define compiletime_assert_rwonce_type(t) \
+ compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
+ "Unsupported access size for {READ,WRITE}_ONCE().")
+
+/*
+ * Use __READ_ONCE() instead of READ_ONCE() if you do not require any
+ * atomicity. Note that this may result in tears!
+ */
+#ifndef __READ_ONCE
+#define __READ_ONCE(x) (*(const volatile __unqual_scalar_typeof(x) *)&(x))
+#endif
+
+#define READ_ONCE(x) \
+({ \
+ compiletime_assert_rwonce_type(x); \
+ __READ_ONCE(x); \
+})
+
+#define __WRITE_ONCE(x, val) \
+do { \
+ *(volatile typeof(x) *)&(x) = (val); \
+} while (0)
+
+#define WRITE_ONCE(x, val) \
+do { \
+ compiletime_assert_rwonce_type(x); \
+ __WRITE_ONCE(x, val); \
+} while (0)
+
+static __no_sanitize_or_inline
+unsigned long __read_once_word_nocheck(const void *addr)
+{
+ return __READ_ONCE(*(unsigned long *)addr);
+}
+
+/*
+ * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need to load a
+ * word from memory atomically but without telling KASAN/KCSAN. This is
+ * usually used by unwinding code when walking the stack of a running process.
+ */
+#define READ_ONCE_NOCHECK(x) \
+({ \
+ compiletime_assert(sizeof(x) == sizeof(unsigned long), \
+ "Unsupported access size for READ_ONCE_NOCHECK()."); \
+ (typeof(x))__read_once_word_nocheck(&(x)); \
+})
+
+static __no_kasan_or_inline
+unsigned long read_word_at_a_time(const void *addr)
+{
+ return *(unsigned long *)addr;
+}
+
+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_GENERIC_RWONCE_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 44602ad5af84..b6500e4630b3 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -210,7 +210,7 @@ static inline void assign_bit(long nr, volatile unsigned long *addr, bool value)
typeof(*ptr) old, new; \
\
do { \
- old = ACCESS_ONCE(*ptr); \
+ old = READ_ONCE(*ptr); \
new = (old & ~mask) | bits; \
} while (cmpxchg(ptr, old, new) != old); \
\
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 6654c164f594..e7dc9ba0a590 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -168,26 +168,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
#include <linux/types.h>
-#define __READ_ONCE_SIZE \
-({ \
- switch (size) { \
- case 1: *(__u8 *)res = *(volatile __u8 *)p; break; \
- case 2: *(__u16 *)res = *(volatile __u16 *)p; break; \
- case 4: *(__u32 *)res = *(volatile __u32 *)p; break; \
- case 8: *(__u64 *)res = *(volatile __u64 *)p; break; \
- default: \
- barrier(); \
- __builtin_memcpy((void *)res, (const void *)p, size); \
- barrier(); \
- } \
-})
-
-static __always_inline
-void __read_once_size(const volatile void *p, void *res, int size)
-{
- __READ_ONCE_SIZE;
-}
-
#ifdef CONFIG_KASAN
/*
* We can't declare function 'inline' because __no_sanitize_address confilcts
@@ -200,80 +180,6 @@ void __read_once_size(const volatile void *p, void *res, int size)
# define __no_kasan_or_inline __always_inline
#endif
-static __no_kasan_or_inline
-void __read_once_size_nocheck(const volatile void *p, void *res, int size)
-{
- __READ_ONCE_SIZE;
-}
-
-static __always_inline void __write_once_size(volatile void *p, void *res, int size)
-{
- switch (size) {
- case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
- case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
- case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
- case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
- default:
- barrier();
- __builtin_memcpy((void *)p, (const void *)res, size);
- barrier();
- }
-}
-
-/*
- * Prevent the compiler from merging or refetching reads or writes. The
- * compiler is also forbidden from reordering successive instances of
- * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
- * particular ordering. One way to make the compiler aware of ordering is to
- * put the two invocations of READ_ONCE or WRITE_ONCE in different C
- * statements.
- *
- * These two macros will also work on aggregate data types like structs or
- * unions. If the size of the accessed data type exceeds the word size of
- * the machine (e.g., 32 bits or 64 bits) READ_ONCE() and WRITE_ONCE() will
- * fall back to memcpy(). There's at least two memcpy()s: one for the
- * __builtin_memcpy() and then one for the macro doing the copy of variable
- * - '__u' allocated on the stack.
- *
- * Their two major use cases are: (1) Mediating communication between
- * process-level code and irq/NMI handlers, all running on the same CPU,
- * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
- * mutilate accesses that either do not require ordering or that interact
- * with an explicit memory barrier or atomic instruction that provides the
- * required ordering.
- */
-
-#define __READ_ONCE(x, check) \
-({ \
- union { typeof(x) __val; char __c[1]; } __u; \
- if (check) \
- __read_once_size(&(x), __u.__c, sizeof(x)); \
- else \
- __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
- __u.__val; \
-})
-#define READ_ONCE(x) __READ_ONCE(x, 1)
-
-/*
- * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need
- * to hide memory access from KASAN.
- */
-#define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0)
-
-static __no_kasan_or_inline
-unsigned long read_word_at_a_time(const void *addr)
-{
- return *(unsigned long *)addr;
-}
-
-#define WRITE_ONCE(x, val) \
-({ \
- union { typeof(x) __val; char __c[1]; } __u = \
- { .__val = (__force typeof(x)) (val) }; \
- __write_once_size(&(x), __u.__c, sizeof(x)); \
- __u.__val; \
-})
-
#endif /* __KERNEL__ */
/**
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index d208004b3989..87ab117aca13 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -222,6 +222,28 @@ struct ftrace_likely_data {
/* Are two types/vars the same type (ignoring qualifiers)? */
#define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
+/*
+ * __unqual_scalar_typeof(x) - Declare an unqualified scalar type, leaving
+ * non-scalar types unchanged.
+ */
+/*
+ * Prefer C11 _Generic for better compile-times and simpler code. Note: 'char'
+ * is not type-compatible with 'signed char', and we define a separate case.
+ */
+#define __scalar_type_to_expr_cases(type) \
+ unsigned type: (unsigned type)0, \
+ signed type: (signed type)0
+
+#define __unqual_scalar_typeof(x) typeof( \
+ _Generic((x), \
+ char: (char)0, \
+ __scalar_type_to_expr_cases(char), \
+ __scalar_type_to_expr_cases(short), \
+ __scalar_type_to_expr_cases(int), \
+ __scalar_type_to_expr_cases(long), \
+ __scalar_type_to_expr_cases(long long), \
+ default: (x)))
+
/* Is this type a native word size -- useful for atomic operations */
#define __native_word(t) \
(sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
@@ -356,6 +378,30 @@ struct ftrace_likely_data {
*/
#define noinline_for_stack noinline
+/*
+ * Sanitizer helper attributes: Because using __always_inline and
+ * __no_sanitize_* conflict, provide helper attributes that will either expand
+ * to __no_sanitize_* in compilation units where instrumentation is enabled
+ * (__SANITIZE_*__), or __always_inline in compilation units without
+ * instrumentation (__SANITIZE_*__ undefined).
+ */
+#ifdef __SANITIZE_ADDRESS__
+/*
+ * We can't declare function 'inline' because __no_sanitize_address conflicts
+ * with inlining. Attempt to inline it may cause a build failure.
+ * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368
+ * '__maybe_unused' allows us to avoid defined-but-not-used warnings.
+ */
+# define __no_kasan_or_inline __no_sanitize_address notrace __maybe_unused
+# define __no_sanitize_or_inline __no_kasan_or_inline
+#else
+# define __no_kasan_or_inline __always_inline
+#endif
+
+#ifndef __no_sanitize_or_inline
+#define __no_sanitize_or_inline __always_inline
+#endif
+
/* code that can't be instrumented at all */
#define noinstr \
noinline notrace __no_sanitize_address __no_stack_protector
diff --git a/include/linux/list.h b/include/linux/list.h
index 35b3f573806a..b90ea3e125d0 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -7,6 +7,7 @@
#include <linux/stddef.h>
#include <linux/poison.h>
#include <linux/const.h>
+#include <asm/barrier.h>
/*
* Simple doubly linked list implementation.
--
2.39.5
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/4] ARM64: mmu: document granule size calculation
2025-05-21 17:41 [PATCH 0/4] ARM: mmu: ensure PTE is written at once Ahmad Fatoum
2025-05-21 17:41 ` [PATCH 1/4] arch: sync READ_ONCE/WRITE_ONCE with Linux Ahmad Fatoum
@ 2025-05-21 17:41 ` Ahmad Fatoum
2025-05-21 17:41 ` [PATCH 3/4] ARM: mmu: ensure PTE is written at once Ahmad Fatoum
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Ahmad Fatoum @ 2025-05-21 17:41 UTC (permalink / raw)
To: barebox; +Cc: Ahmad Fatoum
Copy a nice-looking ASCII table from U-Boot's cache_v8.c to help with
understanding the function.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
arch/arm/cpu/mmu_64.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/arch/arm/cpu/mmu_64.c b/arch/arm/cpu/mmu_64.c
index bc1a44d0a7b8..d9b0b74d7314 100644
--- a/arch/arm/cpu/mmu_64.c
+++ b/arch/arm/cpu/mmu_64.c
@@ -176,6 +176,24 @@ static void create_sections(uint64_t virt, uint64_t phys, uint64_t size,
static size_t granule_size(int level)
{
+ /*
+ * With 4k page granule, a virtual address is split into 4 lookup parts
+ * spanning 9 bits each:
+ *
+ * _______________________________________________
+ * | | | | | | |
+ * | 0 | Lv0 | Lv1 | Lv2 | Lv3 | off |
+ * |_______|_______|_______|_______|_______|_______|
+ * 63-48 47-39 38-30 29-21 20-12 11-00
+ *
+ * mask page size
+ *
+ * Lv0: FF8000000000 --
+ * Lv1: 7FC0000000 1G
+ * Lv2: 3FE00000 2M
+ * Lv3: 1FF000 4K
+ * off: FFF
+ */
switch (level) {
default:
case 0:
--
2.39.5
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 3/4] ARM: mmu: ensure PTE is written at once
2025-05-21 17:41 [PATCH 0/4] ARM: mmu: ensure PTE is written at once Ahmad Fatoum
2025-05-21 17:41 ` [PATCH 1/4] arch: sync READ_ONCE/WRITE_ONCE with Linux Ahmad Fatoum
2025-05-21 17:41 ` [PATCH 2/4] ARM64: mmu: document granule size calculation Ahmad Fatoum
@ 2025-05-21 17:41 ` Ahmad Fatoum
2025-05-21 17:41 ` [PATCH 4/4] WIP: break before make and don't invalidate uncached regions being remapped Ahmad Fatoum
2025-05-26 13:51 ` (subset) [PATCH 0/4] ARM: mmu: ensure PTE is written at once Sascha Hauer
4 siblings, 0 replies; 6+ messages in thread
From: Ahmad Fatoum @ 2025-05-21 17:41 UTC (permalink / raw)
To: barebox; +Cc: Ahmad Fatoum
Writing the PTE with a normal non-volatile access is a bad idea, because
speculation may already use a partially written entry.
Use WRITE_ONCE to turn this into a volatile write via a new set_pte
helper. The extra helper is useful for type enforcement and for future
extension when we start to consistently apply break-before-make[1].
[1]: https://lore.kernel.org/all/87a5btrcyr.wl-maz@kernel.org/
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
arch/arm/cpu/mmu_32.c | 40 +++++++++++++++++++++++++++++-----------
arch/arm/cpu/mmu_64.c | 20 +++++++++++++-------
2 files changed, 42 insertions(+), 18 deletions(-)
diff --git a/arch/arm/cpu/mmu_32.c b/arch/arm/cpu/mmu_32.c
index ec6bd27da4e1..9f50194c7c2b 100644
--- a/arch/arm/cpu/mmu_32.c
+++ b/arch/arm/cpu/mmu_32.c
@@ -65,6 +65,11 @@ static bool pgd_type_table(u32 pgd)
#define PTE_SIZE (PTRS_PER_PTE * sizeof(u32))
+static void set_pte(uint32_t *pt, uint32_t val)
+{
+ WRITE_ONCE(*pt, val);
+}
+
#ifdef __PBL__
static uint32_t *alloc_pte(void)
{
@@ -141,13 +146,14 @@ static u32 *arm_create_pte(unsigned long virt, unsigned long phys,
ttb_idx = pgd_index(virt);
for (i = 0; i < PTRS_PER_PTE; i++) {
- table[i] = phys | PTE_TYPE_SMALL | flags;
+ set_pte(&table[i], phys | PTE_TYPE_SMALL | flags);
virt += PAGE_SIZE;
phys += PAGE_SIZE;
}
dma_flush_range(table, PTRS_PER_PTE * sizeof(u32));
- ttb[ttb_idx] = (unsigned long)table | PMD_TYPE_TABLE;
+ // TODO break-before-make missing
+ set_pte(&ttb[ttb_idx], (unsigned long)table | PMD_TYPE_TABLE);
dma_flush_range(&ttb[ttb_idx], sizeof(u32));
return table;
@@ -264,14 +270,17 @@ static void __arch_remap_range(void *_virt_addr, phys_addr_t phys_addr, size_t s
if (size >= PGDIR_SIZE && pgdir_size_aligned &&
IS_ALIGNED(phys_addr, PGDIR_SIZE) &&
!pgd_type_table(*pgd)) {
+ u32 val;
/*
* TODO: Add code to discard a page table and
* replace it with a section
*/
chunk = PGDIR_SIZE;
- *pgd = phys_addr | pmd_flags;
+ val = phys_addr | pmd_flags;
if (map_type != MAP_FAULT)
- *pgd |= PMD_TYPE_SECT;
+ val |= PMD_TYPE_SECT;
+ // TODO break-before-make missing
+ set_pte(pgd, val);
dma_flush_range(pgd, sizeof(*pgd));
} else {
unsigned int num_ptes;
@@ -310,10 +319,15 @@ static void __arch_remap_range(void *_virt_addr, phys_addr_t phys_addr, size_t s
}
for (i = 0; i < num_ptes; i++) {
- pte[i] = phys_addr + i * PAGE_SIZE;
- pte[i] |= pte_flags;
+ u32 val;
+
+ val = phys_addr + i * PAGE_SIZE;
+ val |= pte_flags;
if (map_type != MAP_FAULT)
- pte[i] |= PTE_TYPE_SMALL;
+ val |= PTE_TYPE_SMALL;
+
+ // TODO break-before-make missing
+ set_pte(&pte[i], val);
}
dma_flush_range(pte, num_ptes * sizeof(u32));
@@ -350,7 +364,7 @@ static void create_sections(unsigned long first, unsigned long last,
unsigned int i, addr = first;
for (i = ttb_start; i < ttb_end; i++) {
- ttb[i] = addr | flags;
+ set_pte(&ttb[i], addr | flags);
addr += PGDIR_SIZE;
}
}
@@ -366,8 +380,10 @@ void *map_io_sections(unsigned long phys, void *_start, size_t size)
unsigned long start = (unsigned long)_start, sec;
uint32_t *ttb = get_ttb();
- for (sec = start; sec < start + size; sec += PGDIR_SIZE, phys += PGDIR_SIZE)
- ttb[pgd_index(sec)] = phys | get_pmd_flags(MAP_UNCACHED);
+ for (sec = start; sec < start + size; sec += PGDIR_SIZE, phys += PGDIR_SIZE) {
+ // TODO break-before-make missing
+ set_pte(&ttb[pgd_index(sec)], phys | get_pmd_flags(MAP_UNCACHED));
+ }
dma_flush_range(ttb, 0x4000);
tlb_invalidate();
@@ -410,7 +426,9 @@ static void create_vector_table(unsigned long adr)
vectors, adr);
arm_create_pte(adr, adr, get_pte_flags(MAP_UNCACHED));
pte = find_pte(adr);
- *pte = (u32)vectors | PTE_TYPE_SMALL | get_pte_flags(MAP_CACHED);
+ // TODO break-before-make missing
+ set_pte(pte, (u32)vectors | PTE_TYPE_SMALL |
+ get_pte_flags(MAP_CACHED));
}
arm_fixup_vectors();
diff --git a/arch/arm/cpu/mmu_64.c b/arch/arm/cpu/mmu_64.c
index d9b0b74d7314..efd788c93e7b 100644
--- a/arch/arm/cpu/mmu_64.c
+++ b/arch/arm/cpu/mmu_64.c
@@ -31,12 +31,14 @@ static uint64_t *get_ttb(void)
return (uint64_t *)get_ttbr(current_el());
}
+static void set_pte(uint64_t *pt, uint64_t val)
+{
+ WRITE_ONCE(*pt, val);
+}
+
static void set_table(uint64_t *pt, uint64_t *table_addr)
{
- uint64_t val;
-
- val = PTE_TYPE_TABLE | (uint64_t)table_addr;
- *pt = val;
+ set_pte(pt, PTE_TYPE_TABLE | (uint64_t)table_addr);
}
#ifdef __PBL__
@@ -114,14 +116,16 @@ static void split_block(uint64_t *pte, int level)
for (i = 0; i < MAX_PTE_ENTRIES; i++) {
- new_table[i] = old_pte | (i << levelshift);
+ set_pte(&new_table[i], old_pte | (i << levelshift));
/* Level 3 block PTEs have the table type */
if ((level + 1) == 3)
new_table[i] |= PTE_TYPE_TABLE;
}
- /* Set the new table into effect */
+ /* Set the new table into effect
+ * TODO: break-before-make missing
+ */
set_table(pte, new_table);
}
@@ -157,7 +161,9 @@ static void create_sections(uint64_t virt, uint64_t phys, uint64_t size,
IS_ALIGNED(phys, block_size)) {
type = (level == 3) ?
PTE_TYPE_PAGE : PTE_TYPE_BLOCK;
- *pte = phys | attr | type;
+
+ /* TODO: break-before-make missing */
+ set_pte(pte, phys | attr | type);
addr += block_size;
phys += block_size;
size -= block_size;
--
2.39.5
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 4/4] WIP: break before make and don't invalidate uncached regions being remapped
2025-05-21 17:41 [PATCH 0/4] ARM: mmu: ensure PTE is written at once Ahmad Fatoum
` (2 preceding siblings ...)
2025-05-21 17:41 ` [PATCH 3/4] ARM: mmu: ensure PTE is written at once Ahmad Fatoum
@ 2025-05-21 17:41 ` Ahmad Fatoum
2025-05-26 13:51 ` (subset) [PATCH 0/4] ARM: mmu: ensure PTE is written at once Sascha Hauer
4 siblings, 0 replies; 6+ messages in thread
From: Ahmad Fatoum @ 2025-05-21 17:41 UTC (permalink / raw)
To: barebox; +Cc: Ahmad Fatoum
This needs to be split up and cleaned up, but I include it anyway
to show what needs to be done still, here for arm32 and for arm64 as
well.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
arch/arm/cpu/mmu_32.c | 253 +++++++++++++++++++++++++++++++++---------
1 file changed, 201 insertions(+), 52 deletions(-)
diff --git a/arch/arm/cpu/mmu_32.c b/arch/arm/cpu/mmu_32.c
index 9f50194c7c2b..d4ed298ac64f 100644
--- a/arch/arm/cpu/mmu_32.c
+++ b/arch/arm/cpu/mmu_32.c
@@ -70,6 +70,45 @@ static void set_pte(uint32_t *pt, uint32_t val)
WRITE_ONCE(*pt, val);
}
+static void set_pte_range(uint32_t *virt, phys_addr_t phys,
+ size_t count, uint32_t flags,
+ bool break_before_make)
+{
+ bool made = false;
+
+ if (!break_before_make)
+ goto write_attrs;
+
+ if ((flags & PTE_TYPE_MASK) == PTE_TYPE_FAULT)
+ phys = 0;
+
+ for (int i = 0; i < count; i++) {
+ if (READ_ONCE(virt[i]) == ((phys + i * PAGE_SIZE) | flags))
+ continue;
+ set_pte(&virt[i], PTE_TYPE_FAULT);
+ made = true;
+ }
+
+ if (made) {
+ dma_flush_range( virt, count * sizeof(u32));
+ tlb_invalidate();
+ } else {
+ break_before_make = false;
+ }
+
+write_attrs:
+ for (int i = 0; i < count; i++, phys += PAGE_SIZE)
+ set_pte(&virt[i], phys | flags);
+
+ dma_flush_range(virt, count * sizeof(u32));
+
+#if 0
+ pr_notice("%s(0x%08x+0x%zx -> 0x%08x, flags=0x%x%s)\n", __func__,
+ (unsigned)virt, count, phys, flags,
+ made ? " [BBM]" : break_before_make ? " [BBM, but unneeded]" : "");
+#endif
+}
+
#ifdef __PBL__
static uint32_t *alloc_pte(void)
{
@@ -89,30 +128,47 @@ static uint32_t *alloc_pte(void)
}
#endif
-static u32 *find_pte(unsigned long adr)
+static u32 *__find_pte(uint32_t *ttb, unsigned long adr, int *level)
{
+ u32 *pgd = (u32 *)&ttb[pgd_index(adr)];
u32 *table;
- uint32_t *ttb = get_ttb();
- if (!pgd_type_table(ttb[pgd_index(adr)]))
- return NULL;
+ if (!pgd_type_table(*pgd)) {
+ *level = 1;
+ return pgd;
+ }
+
+ *level = 2;
/* find the coarse page table base address */
- table = (u32 *)(ttb[pgd_index(adr)] & ~0x3ff);
+ table = (u32 *)(*pgd & ~0x3ff);
/* find second level descriptor */
return &table[(adr >> PAGE_SHIFT) & 0xff];
}
+static u32 *find_pte(unsigned long adr)
+{
+ int level;
+ u32 *pte = __find_pte(get_ttb(), adr, &level);
+
+ return level == 2 ? pte : NULL;
+}
+
+static void dma_flush_range_end(unsigned long start, unsigned long end)
+{
+ __dma_flush_range(start, end);
+
+ if (outer_cache.flush_range)
+ outer_cache.flush_range(start, end);
+}
+
void dma_flush_range(void *ptr, size_t size)
{
unsigned long start = (unsigned long)ptr;
unsigned long end = start + size;
- __dma_flush_range(start, end);
-
- if (outer_cache.flush_range)
- outer_cache.flush_range(start, end);
+ dma_flush_range_end(start, end);
}
void dma_inv_range(void *ptr, size_t size)
@@ -132,11 +188,11 @@ void dma_inv_range(void *ptr, size_t size)
* Not yet exported, but may be later if someone finds use for it.
*/
static u32 *arm_create_pte(unsigned long virt, unsigned long phys,
- uint32_t flags)
+ uint32_t flags, bool break_before_make)
{
uint32_t *ttb = get_ttb();
u32 *table;
- int i, ttb_idx;
+ int ttb_idx;
virt = ALIGN_DOWN(virt, PGDIR_SIZE);
phys = ALIGN_DOWN(phys, PGDIR_SIZE);
@@ -145,16 +201,11 @@ static u32 *arm_create_pte(unsigned long virt, unsigned long phys,
ttb_idx = pgd_index(virt);
- for (i = 0; i < PTRS_PER_PTE; i++) {
- set_pte(&table[i], phys | PTE_TYPE_SMALL | flags);
- virt += PAGE_SIZE;
- phys += PAGE_SIZE;
- }
- dma_flush_range(table, PTRS_PER_PTE * sizeof(u32));
+ set_pte_range(table, phys, PTRS_PER_PTE, PTE_TYPE_SMALL | flags,
+ break_before_make);
- // TODO break-before-make missing
- set_pte(&ttb[ttb_idx], (unsigned long)table | PMD_TYPE_TABLE);
- dma_flush_range(&ttb[ttb_idx], sizeof(u32));
+ set_pte_range(&ttb[ttb_idx], (unsigned long)table, 1,
+ PMD_TYPE_TABLE, break_before_make);
return table;
}
@@ -243,6 +294,22 @@ static uint32_t get_pte_flags(int map_type)
}
}
+static const char *map_type_tostr(int map_type)
+{
+ switch (map_type) {
+ case MAP_CACHED:
+ return "CACHED";
+ case MAP_UNCACHED:
+ return "UNCACHED";
+ case ARCH_MAP_WRITECOMBINE:
+ return "WRITECOMBINE";
+ case MAP_FAULT:
+ return "FAULT";
+ default:
+ return "<unknown>";
+ }
+}
+
static uint32_t get_pmd_flags(int map_type)
{
return pte_flags_to_pmd(get_pte_flags(map_type));
@@ -250,6 +317,7 @@ static uint32_t get_pmd_flags(int map_type)
static void __arch_remap_range(void *_virt_addr, phys_addr_t phys_addr, size_t size, unsigned map_type)
{
+ bool mmu_on;
u32 virt_addr = (u32)_virt_addr;
u32 pte_flags, pmd_flags;
uint32_t *ttb = get_ttb();
@@ -262,6 +330,13 @@ static void __arch_remap_range(void *_virt_addr, phys_addr_t phys_addr, size_t s
size = PAGE_ALIGN(size);
+ mmu_on = get_cr() & CR_M;
+
+ pr_info("[MMU %s]remapping 0x%08x+0x%zx: phys 0x%08lx, type %s\n",
+ get_cr() & CR_M ? " ON" : "OFF",
+ virt_addr, size, (ulong)phys_addr,
+ map_type_tostr(map_type));
+
while (size) {
const bool pgdir_size_aligned = IS_ALIGNED(virt_addr, PGDIR_SIZE);
u32 *pgd = (u32 *)&ttb[pgd_index(virt_addr)];
@@ -270,22 +345,20 @@ static void __arch_remap_range(void *_virt_addr, phys_addr_t phys_addr, size_t s
if (size >= PGDIR_SIZE && pgdir_size_aligned &&
IS_ALIGNED(phys_addr, PGDIR_SIZE) &&
!pgd_type_table(*pgd)) {
- u32 val;
+ u32 flags;
/*
* TODO: Add code to discard a page table and
* replace it with a section
*/
chunk = PGDIR_SIZE;
- val = phys_addr | pmd_flags;
+ flags = pmd_flags;
if (map_type != MAP_FAULT)
- val |= PMD_TYPE_SECT;
- // TODO break-before-make missing
- set_pte(pgd, val);
- dma_flush_range(pgd, sizeof(*pgd));
+ flags |= PMD_TYPE_SECT;
+ set_pte_range(pgd, phys_addr, 1, flags, mmu_on);
} else {
unsigned int num_ptes;
u32 *table = NULL;
- unsigned int i;
+ u32 flags;
u32 *pte;
/*
* We only want to cover pages up until next
@@ -313,24 +386,16 @@ static void __arch_remap_range(void *_virt_addr, phys_addr_t phys_addr, size_t s
* create a new page table for it
*/
table = arm_create_pte(virt_addr, phys_addr,
- pmd_flags_to_pte(*pgd));
+ pmd_flags_to_pte(*pgd), mmu_on);
pte = find_pte(virt_addr);
BUG_ON(!pte);
}
- for (i = 0; i < num_ptes; i++) {
- u32 val;
+ flags = pte_flags;
+ if (map_type != MAP_FAULT)
+ flags |= PTE_TYPE_SMALL;
- val = phys_addr + i * PAGE_SIZE;
- val |= pte_flags;
- if (map_type != MAP_FAULT)
- val |= PTE_TYPE_SMALL;
-
- // TODO break-before-make missing
- set_pte(&pte[i], val);
- }
-
- dma_flush_range(pte, num_ptes * sizeof(u32));
+ set_pte_range(pte, phys_addr, num_ptes, flags, mmu_on);
}
virt_addr += chunk;
@@ -345,12 +410,99 @@ static void early_remap_range(u32 addr, size_t size, unsigned map_type)
__arch_remap_range((void *)addr, addr, size, map_type);
}
+static size_t granule_size(int level)
+{
+ switch (level) {
+ default:
+ case 1:
+ return PGDIR_SIZE;
+ case 2:
+ return PAGE_SIZE;
+ }
+}
+
+static bool pte_is_cacheable(uint32_t pte, int level)
+{
+ return (level == 2 && (pte & PTE_CACHEABLE)) ||
+ (level == 1 && (pte & PMD_SECT_CACHEABLE));
+}
+
+/**
+ * flush_cacheable_pages - Flush only the cacheable pages in a region
+ * @start: Starting virtual address of the range.
+ * @end: Ending virtual address of the range.
+ *
+ * This function walks the page table and flushes the data caches for the
+ * specified range only if the memory is marked as normal cacheable in the
+ * page tables. If a non-cacheable or non-normal page is encountered,
+ * it's skipped.
+ */
+static void flush_cacheable_pages(void *start, size_t size)
+{
+ u32 flush_start = ~0UL, flush_end = ~0UL;
+ u32 region_start, region_end;
+ size_t block_size;
+ u32 *ttb;
+
+ region_start = PAGE_ALIGN_DOWN((ulong)start);
+ region_end = PAGE_ALIGN(region_start + size);
+
+ ttb = get_ttb();
+
+ /*
+ * TODO: This loop could be made more optimal by inlining the page walk,
+ * so we need not restart address translation from the top every time.
+ *
+ * The hope is that with the page tables being cached and the
+ * windows being remapped being small, the overhead compared to
+ * actually flushing the ranges isn't too significant.
+ */
+ for (u32 addr = region_start; addr < region_end; addr += block_size) {
+ int level;
+ u32 *pte = __find_pte(ttb, addr, &level);
+
+ block_size = granule_size(level);
+
+ if (!pte || !pte_is_cacheable(*pte, level))
+ continue;
+
+ if (flush_end == addr) {
+ /*
+ * While it's safe to flush the whole block_size,
+ * it's unnecessary time waste to go beyond region_end.
+ */
+ flush_end = min(flush_end + block_size, region_end);
+ continue;
+ }
+
+ /*
+ * We don't have a previous contiguous flush area to append to.
+ * If we recorded any area before, let's flush it now
+ */
+ if (flush_start != ~0U) {
+ pr_notice("flushing %x-%x\n", flush_start, flush_end);
+ dma_flush_range_end(flush_start, flush_end);
+ }
+
+ /* and start the new contiguous flush area with this page */
+ flush_start = addr;
+ flush_end = min(flush_start + block_size, region_end);
+ }
+
+ /* The previous loop won't flush the last cached range, so do it here */
+ if (flush_start != ~0UL) {
+ pr_notice("flushing %x-%x\n", flush_start, flush_end);
+ dma_flush_range_end(flush_start, flush_end);
+ }
+}
+
int arch_remap_range(void *virt_addr, phys_addr_t phys_addr, size_t size, unsigned map_type)
{
- __arch_remap_range(virt_addr, phys_addr, size, map_type);
- if (map_type == MAP_UNCACHED)
- dma_inv_range(virt_addr, size);
+ if (map_type != MAP_CACHED)
+ flush_cacheable_pages(virt_addr, size);
+
+ __arch_remap_range(virt_addr, phys_addr, size, map_type);
return 0;
}
@@ -377,13 +529,11 @@ static inline void create_flat_mapping(void)
void *map_io_sections(unsigned long phys, void *_start, size_t size)
{
- unsigned long start = (unsigned long)_start, sec;
+ unsigned long start = (unsigned long)_start;
uint32_t *ttb = get_ttb();
- for (sec = start; sec < start + size; sec += PGDIR_SIZE, phys += PGDIR_SIZE) {
- // TODO break-before-make missing
- set_pte(&ttb[pgd_index(sec)], phys | get_pmd_flags(MAP_UNCACHED));
- }
+ set_pte_range(&ttb[pgd_index(start)], phys, size / PGDIR_SIZE,
+ get_pmd_flags(MAP_UNCACHED), true);
dma_flush_range(ttb, 0x4000);
tlb_invalidate();
@@ -424,11 +574,10 @@ static void create_vector_table(unsigned long adr)
vectors = xmemalign(PAGE_SIZE, PAGE_SIZE);
pr_debug("Creating vector table, virt = 0x%p, phys = 0x%08lx\n",
vectors, adr);
- arm_create_pte(adr, adr, get_pte_flags(MAP_UNCACHED));
+ arm_create_pte(adr, adr, get_pte_flags(MAP_UNCACHED), true);
pte = find_pte(adr);
- // TODO break-before-make missing
- set_pte(pte, (u32)vectors | PTE_TYPE_SMALL |
- get_pte_flags(MAP_CACHED));
+ set_pte_range(pte, (u32)vectors, 1, PTE_TYPE_SMALL |
+ get_pte_flags(MAP_CACHED), true);
}
arm_fixup_vectors();
--
2.39.5
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: (subset) [PATCH 0/4] ARM: mmu: ensure PTE is written at once
2025-05-21 17:41 [PATCH 0/4] ARM: mmu: ensure PTE is written at once Ahmad Fatoum
` (3 preceding siblings ...)
2025-05-21 17:41 ` [PATCH 4/4] WIP: break before make and don't invalidate uncached regions being remapped Ahmad Fatoum
@ 2025-05-26 13:51 ` Sascha Hauer
4 siblings, 0 replies; 6+ messages in thread
From: Sascha Hauer @ 2025-05-26 13:51 UTC (permalink / raw)
To: barebox, Ahmad Fatoum
On Wed, 21 May 2025 19:41:00 +0200, Ahmad Fatoum wrote:
> Writing the PTE with a normal non-volatile access is a bad idea, because
> speculation may already use a partially written entry.
>
> Use WRITE_ONCE to turn this into a volatile write via a new set_pte
> helper. The extra helper is useful for type enforcement and for future
> extension when we start to consistently apply break-before-make[1].
>
> [...]
Applied, thanks!
[1/4] arch: sync READ_ONCE/WRITE_ONCE with Linux
https://git.pengutronix.de/cgit/barebox/commit/?id=cacff83edeef (link may not be stable)
[2/4] ARM64: mmu: document granule size calculation
https://git.pengutronix.de/cgit/barebox/commit/?id=ebe6d3b8b732 (link may not be stable)
[3/4] ARM: mmu: ensure PTE is written at once
https://git.pengutronix.de/cgit/barebox/commit/?id=cc909e5d084e (link may not be stable)
Best regards,
--
Sascha Hauer <s.hauer@pengutronix.de>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-05-26 13:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-21 17:41 [PATCH 0/4] ARM: mmu: ensure PTE is written at once Ahmad Fatoum
2025-05-21 17:41 ` [PATCH 1/4] arch: sync READ_ONCE/WRITE_ONCE with Linux Ahmad Fatoum
2025-05-21 17:41 ` [PATCH 2/4] ARM64: mmu: document granule size calculation Ahmad Fatoum
2025-05-21 17:41 ` [PATCH 3/4] ARM: mmu: ensure PTE is written at once Ahmad Fatoum
2025-05-21 17:41 ` [PATCH 4/4] WIP: break before make and don't invalidate uncached regions being remapped Ahmad Fatoum
2025-05-26 13:51 ` (subset) [PATCH 0/4] ARM: mmu: ensure PTE is written at once Sascha Hauer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox