* [PATCH v2 1/4] ARM: cache-armv7: work around Cortex-A7 erratum 814220
2019-04-25 14:32 [PATCH v2 0/4] ARM: mmu: misc armv7 cache/MMU fixes Ahmad Fatoum
@ 2019-04-25 14:32 ` Ahmad Fatoum
2019-04-25 14:32 ` [PATCH v2 2/4] ARM: cache-armv7: start invalidation from outer levels Ahmad Fatoum
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Ahmad Fatoum @ 2019-04-25 14:32 UTC (permalink / raw)
To: barebox; +Cc: Ahmad Fatoum, lst, sam
This patch is based on 0e9a87bb from the linux-imx kernel:
------------------------------------------------------------------------------
| ARM/MP: 814220—B-Cache maintenance by set/way operations can execute
| out of order.
|
| Description:
| The v7 ARM states that all cache and branch predictor maintenance operations
| that do not specify an address execute, relative to each other, in program
| order. However, because of this erratum, an L2 set/way cache maintenance
| operation can overtake an L1 set/way cache maintenance operation, this would
| cause the data corruption.
|
| This ERRATA affected the Cortex-A7 and present in r0p2, r0p3, r0p4, r0p5.
|
| This patch is the SW workaround by adding a DSB before changing cache levels
| as the ARM ERRATA: ARM/MP: 814220 told in the ARM ERRATA documentation.
| Signed-off-by: Jason Liu <r64343@freescale.com>
------------------------------------------------------------------------------
It was later posted to LKML for Linux inclusion, but is not yet mainline:
<20190214083145.15148-1-benjamin.gaignard@linaro.org>
Unlike the Linux version, we don't make the barrier dependent on a
Kconfig option and always execute the dsb:
On 25/4/19 12:02, Lucas Stach wrote:
> I don't think we need a Kconfig option here. This function is not
> really performance critical. The short pipeline stall introduced by the
> dsb when switching the cache level is minor compared to the time it
> takes to actually move the cache blocks on a clean.
>
> Just always execute the [dsb] and add a comment on why it is needed.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
arch/arm/cpu/cache-armv7.S | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm/cpu/cache-armv7.S b/arch/arm/cpu/cache-armv7.S
index 7a1c5c01894c..9487dd03f6cc 100644
--- a/arch/arm/cpu/cache-armv7.S
+++ b/arch/arm/cpu/cache-armv7.S
@@ -120,6 +120,7 @@ THUMB( ite eq )
skip:
add r12, r12, #2 @ increment cache number
cmp r3, r12
+ dsb @ work-around Cortex-A7 erratum 814220
bgt loop1
finished:
ldmfd sp!, {r4-r11}
--
2.20.1
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 2/4] ARM: cache-armv7: start invalidation from outer levels
2019-04-25 14:32 [PATCH v2 0/4] ARM: mmu: misc armv7 cache/MMU fixes Ahmad Fatoum
2019-04-25 14:32 ` [PATCH v2 1/4] ARM: cache-armv7: work around Cortex-A7 erratum 814220 Ahmad Fatoum
@ 2019-04-25 14:32 ` Ahmad Fatoum
2019-04-25 14:38 ` [PATCH v2 2/4] fixup! " Ahmad Fatoum
2019-04-25 14:32 ` [PATCH v2 3/4] ARM: mmu: remove doubly defined macro Ahmad Fatoum
` (2 subsequent siblings)
4 siblings, 1 reply; 7+ messages in thread
From: Ahmad Fatoum @ 2019-04-25 14:32 UTC (permalink / raw)
To: barebox; +Cc: Ahmad Fatoum, lst, sam
On 25/4/19 11:57, Lucas Stach wrote:
> [T]he sequence that could go wrong in Barebox is as follows:
> 1. CPU core starts invalidating at L1 cache level
> 2. HW prefetcher decides that a specific address should be brought into
> the L1 cache.
> 3. HW prefetcher finds a valid block for the requested address in L2
> cache and moves cached data from L2 to L1.
> 4. Only now CPU core invalidates L2 cache.
>
> In the above sequence we now have invalid data in the L1 cache line.
> The correct sequence will avoid this issue:
>
> 1. CPU core starts invalidating at L2 cache level
> 2. HW prefetcher decides that a specific address should be brought into
> the L1 cache.
> 3. HW prefetcher sees invalid tags for the requested address in L2
> cache and brings in the data from external memory.
> 4. CPU core invalidates L1 cache, discarding the prefetched data.
>
The ARM Cortex-A Series Programmer's Guide addresses this issue
in the SMP-context[1]:
> If another core were to access the affected address between those
> two actions, a coherency problem can occur. Such problems can be avoided
> by following two simple rules.
>
> * When cleaning, always clean the innermost (L1) cache first and then
> clean the outer cache(s).
> * When invalidating, always invalidate the outermost cache first and
> the L1 cache last.
The current code correctly iterates from inner to outer cache levels
when flushing/cleaning (r8 == 0), invalidation (r8 == 1) occurs in the
same direction though. Adjust the invalidation iteration order to start
from the outermost cache instead.
Equivalent C-Code:
enum cache_op { CACHE_FLUSH = 0, CACHE_INVAL = 1 };
register enum cache_op operation asm("r8");
register int i asm("r12");
register int limit asm("r3") = max_cache_level << 1; // e.g. 4 with L2 max
+if (operation == CACHE_FLUSH) {
i = 0;
+} else {
+ i = limit - 2;
+}
bool loop_again;
do {
/* [snip] */
+ if (operation == CACHE_FLUSH) {
i += 2;
loop_again = limit > i;
+ } else {
+ loop_again = i > 0;
+ i -= 2;
+ }
} while (loop_again);
[1]: 18.6 "TLB and cache maintenance broadcast", Version 4.0
Suggested-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
arch/arm/cpu/cache-armv7.S | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/arm/cpu/cache-armv7.S b/arch/arm/cpu/cache-armv7.S
index 9487dd03f6cc..21e189cda1d7 100644
--- a/arch/arm/cpu/cache-armv7.S
+++ b/arch/arm/cpu/cache-armv7.S
@@ -83,7 +83,10 @@ hierarchical:
ands r3, r0, #0x7000000 @ extract loc from clidr
mov r3, r3, lsr #23 @ left align loc bit field
beq finished @ if loc is 0, then no need to clean
- mov r12, #0 @ start clean at cache level 0
+ cmp r8, #0
+THUMB( ite eq )
+ moveq r12, #0
+ subne r12, r3, #2 @ start invalidate at outmost cache level
loop1:
add r2, r12, r12, lsr #1 @ work out 3x current cache level
mov r1, r0, lsr r2 @ extract cache type bits from clidr
@@ -118,9 +121,16 @@ THUMB( ite eq )
subs r7, r7, #1 @ decrement the index
bge loop2
skip:
+ cmp r8, #0
+ bne inval_check
add r12, r12, #2 @ increment cache number
cmp r3, r12
- dsb @ work-around Cortex-A7 erratum 814220
+ b loop_end_check
+inval_check:
+ cmp r12, #0
+ sub r12, r12, #2 @ decrement cache number
+loop_end_check:
+ dsb
bgt loop1
finished:
ldmfd sp!, {r4-r11}
--
2.20.1
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 3/4] ARM: mmu: remove doubly defined macro
2019-04-25 14:32 [PATCH v2 0/4] ARM: mmu: misc armv7 cache/MMU fixes Ahmad Fatoum
2019-04-25 14:32 ` [PATCH v2 1/4] ARM: cache-armv7: work around Cortex-A7 erratum 814220 Ahmad Fatoum
2019-04-25 14:32 ` [PATCH v2 2/4] ARM: cache-armv7: start invalidation from outer levels Ahmad Fatoum
@ 2019-04-25 14:32 ` Ahmad Fatoum
2019-04-25 14:32 ` [PATCH v2 4/4] ARM: mmu: mark uncached regions as eXecute never on v7 Ahmad Fatoum
2019-04-29 6:59 ` [PATCH v2 0/4] ARM: mmu: misc armv7 cache/MMU fixes Sascha Hauer
4 siblings, 0 replies; 7+ messages in thread
From: Ahmad Fatoum @ 2019-04-25 14:32 UTC (permalink / raw)
To: barebox; +Cc: Ahmad Fatoum, lst, sam
PMD_SECT_DEF_CACHED is defined along with PMD_SECT_DEF_UNCACHED
in mmu.h, which is included two lines prior.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
arch/arm/cpu/mmu.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/arm/cpu/mmu.c b/arch/arm/cpu/mmu.c
index 29816ad56324..ed27d1e4b654 100644
--- a/arch/arm/cpu/mmu.c
+++ b/arch/arm/cpu/mmu.c
@@ -34,7 +34,6 @@
#include "mmu.h"
-#define PMD_SECT_DEF_CACHED (PMD_SECT_WB | PMD_SECT_DEF_UNCACHED)
#define PTRS_PER_PTE (PGDIR_SIZE / PAGE_SIZE)
#define ARCH_MAP_WRITECOMBINE ((unsigned)-1)
--
2.20.1
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 4/4] ARM: mmu: mark uncached regions as eXecute never on v7
2019-04-25 14:32 [PATCH v2 0/4] ARM: mmu: misc armv7 cache/MMU fixes Ahmad Fatoum
` (2 preceding siblings ...)
2019-04-25 14:32 ` [PATCH v2 3/4] ARM: mmu: remove doubly defined macro Ahmad Fatoum
@ 2019-04-25 14:32 ` Ahmad Fatoum
2019-04-29 6:59 ` [PATCH v2 0/4] ARM: mmu: misc armv7 cache/MMU fixes Sascha Hauer
4 siblings, 0 replies; 7+ messages in thread
From: Ahmad Fatoum @ 2019-04-25 14:32 UTC (permalink / raw)
To: barebox; +Cc: Ahmad Fatoum, lst, sam
The ARM Cortex-A Series Programmer's Guide notes[1]:
> When set, the Execute Never (XN) bit in the translation table entry
> prevents speculative instruction fetches taking place from desired
> memory locations and will cause a prefetch abort to occur if execution
> from the memory location is attempted.
>
> Typically device memory regions are marked as execute never to prevent
> accidental execution from such locations, and to prevent undesirable
> side-effects which might be caused by speculative instruction fetches.
Heed the advice and mark uncached memory with the XN bit, when the
CPU is >=v7.
It's possible that there are SoCs that have a section shared between
device memory and the on-chip RAM hosting the PBL.
In such a section, every page except for the OCRAM's should be mapped XN,
but as we know of no SoC with such an OCRAM layout, we ignore this
possibility for now and let mmu_early_enable map sections only.
[1]: 9.6.3 "Execute Never", Version 4.0
Suggested-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
arch/arm/cpu/mmu-early.c | 27 ++++++++++++++++++++++++---
arch/arm/cpu/mmu.c | 15 ++++++++++-----
arch/arm/cpu/mmu.h | 8 +++++++-
3 files changed, 41 insertions(+), 9 deletions(-)
diff --git a/arch/arm/cpu/mmu-early.c b/arch/arm/cpu/mmu-early.c
index d39a03ed95d6..2f5876fc46d8 100644
--- a/arch/arm/cpu/mmu-early.c
+++ b/arch/arm/cpu/mmu-early.c
@@ -5,17 +5,20 @@
#include <asm/memory.h>
#include <asm/system.h>
#include <asm/cache.h>
+#include <asm-generic/sections.h>
#include "mmu.h"
static uint32_t *ttb;
-static void map_cachable(unsigned long start, unsigned long size)
+static inline void map_region(unsigned long start, unsigned long size,
+ uint64_t flags)
+
{
start = ALIGN_DOWN(start, SZ_1M);
size = ALIGN(size, SZ_1M);
- create_sections(ttb, start, start + size - 1, PMD_SECT_DEF_CACHED);
+ create_sections(ttb, start, start + size - 1, flags);
}
void mmu_early_enable(unsigned long membase, unsigned long memsize,
@@ -28,9 +31,27 @@ void mmu_early_enable(unsigned long membase, unsigned long memsize,
set_ttbr(ttb);
set_domain(DOMAIN_MANAGER);
+ /*
+ * This marks the whole address space as uncachable as well as
+ * unexecutable if possible
+ */
create_flat_mapping(ttb);
- map_cachable(membase, memsize);
+ /*
+ * There can be SoCs that have a section shared between device memory
+ * and the on-chip RAM hosting the PBL. Thus mark this section
+ * uncachable, but executable.
+ * On such SoCs, executing from OCRAM could cause the instruction
+ * prefetcher to speculatively access that device memory, triggering
+ * potential errant behavior.
+ *
+ * If your SoC has such a memory layout, you should rewrite the code
+ * here to map the OCRAM page-wise.
+ */
+ map_region((unsigned long)_stext, _etext - _stext, PMD_SECT_DEF_UNCACHED);
+
+ /* maps main memory as cachable */
+ map_region(membase, memsize, PMD_SECT_DEF_CACHED);
__mmu_cache_on();
}
diff --git a/arch/arm/cpu/mmu.c b/arch/arm/cpu/mmu.c
index ed27d1e4b654..123e19e9e55c 100644
--- a/arch/arm/cpu/mmu.c
+++ b/arch/arm/cpu/mmu.c
@@ -57,11 +57,13 @@ static inline void tlb_invalidate(void)
}
#define PTE_FLAGS_CACHED_V7 (PTE_EXT_TEX(1) | PTE_BUFFERABLE | PTE_CACHEABLE)
-#define PTE_FLAGS_WC_V7 PTE_EXT_TEX(1)
-#define PTE_FLAGS_UNCACHED_V7 (0)
+#define PTE_FLAGS_WC_V7 (PTE_EXT_TEX(1) | PTE_EXT_XN)
+#define PTE_FLAGS_UNCACHED_V7 PTE_EXT_XN
#define PTE_FLAGS_CACHED_V4 (PTE_SMALL_AP_UNO_SRW | PTE_BUFFERABLE | PTE_CACHEABLE)
#define PTE_FLAGS_UNCACHED_V4 PTE_SMALL_AP_UNO_SRW
-#define PGD_FLAGS_WC_V7 (PMD_SECT_TEX(1) | PMD_TYPE_SECT | PMD_SECT_BUFFERABLE)
+#define PGD_FLAGS_WC_V7 (PMD_SECT_TEX(1) | PMD_TYPE_SECT | PMD_SECT_BUFFERABLE | \
+ PMD_SECT_XN)
+#define PGD_FLAGS_UNCACHED_V7 (PMD_SECT_DEF_UNCACHED | PMD_SECT_XN)
/*
* PTE flags to set cached and uncached areas.
@@ -71,6 +73,7 @@ static uint32_t pte_flags_cached;
static uint32_t pte_flags_wc;
static uint32_t pte_flags_uncached;
static uint32_t pgd_flags_wc;
+static uint32_t pgd_flags_uncached;
#define PTE_MASK ((1 << 12) - 1)
@@ -163,7 +166,7 @@ int arch_remap_range(void *start, size_t size, unsigned flags)
break;
case MAP_UNCACHED:
pte_flags = pte_flags_uncached;
- pgd_flags = PMD_SECT_DEF_UNCACHED;
+ pgd_flags = pgd_flags_uncached;
break;
case ARCH_MAP_WRITECOMBINE:
pte_flags = pte_flags_wc;
@@ -246,7 +249,7 @@ void *map_io_sections(unsigned long phys, void *_start, size_t size)
unsigned long start = (unsigned long)_start, sec;
for (sec = start; sec < start + size; sec += PGDIR_SIZE, phys += PGDIR_SIZE)
- ttb[pgd_index(sec)] = phys | PMD_SECT_DEF_UNCACHED;
+ ttb[pgd_index(sec)] = phys | pgd_flags_uncached;
dma_flush_range(ttb, 0x4000);
tlb_invalidate();
@@ -410,11 +413,13 @@ void __mmu_init(bool mmu_on)
pte_flags_cached = PTE_FLAGS_CACHED_V7;
pte_flags_wc = PTE_FLAGS_WC_V7;
pgd_flags_wc = PGD_FLAGS_WC_V7;
+ pgd_flags_uncached = PGD_FLAGS_UNCACHED_V7;
pte_flags_uncached = PTE_FLAGS_UNCACHED_V7;
} else {
pte_flags_cached = PTE_FLAGS_CACHED_V4;
pte_flags_wc = PTE_FLAGS_UNCACHED_V4;
pgd_flags_wc = PMD_SECT_DEF_UNCACHED;
+ pgd_flags_uncached = PMD_SECT_DEF_UNCACHED;
pte_flags_uncached = PTE_FLAGS_UNCACHED_V4;
}
diff --git a/arch/arm/cpu/mmu.h b/arch/arm/cpu/mmu.h
index 338728aacd3b..c911ee209f51 100644
--- a/arch/arm/cpu/mmu.h
+++ b/arch/arm/cpu/mmu.h
@@ -3,6 +3,7 @@
#include <asm/pgtable.h>
#include <linux/sizes.h>
+#include <asm/system_info.h>
#include "mmu-common.h"
@@ -62,8 +63,13 @@ create_sections(uint32_t *ttb, unsigned long first,
static inline void create_flat_mapping(uint32_t *ttb)
{
+ unsigned int flags = PMD_SECT_DEF_UNCACHED;
+
+ if (cpu_architecture() >= CPU_ARCH_ARMv7)
+ flags |= PMD_SECT_XN;
+
/* create a flat mapping using 1MiB sections */
- create_sections(ttb, 0, 0xffffffff, PMD_SECT_DEF_UNCACHED);
+ create_sections(ttb, 0, 0xffffffff, flags);
}
#endif /* __ARM_MMU_H */
--
2.20.1
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 0/4] ARM: mmu: misc armv7 cache/MMU fixes
2019-04-25 14:32 [PATCH v2 0/4] ARM: mmu: misc armv7 cache/MMU fixes Ahmad Fatoum
` (3 preceding siblings ...)
2019-04-25 14:32 ` [PATCH v2 4/4] ARM: mmu: mark uncached regions as eXecute never on v7 Ahmad Fatoum
@ 2019-04-29 6:59 ` Sascha Hauer
4 siblings, 0 replies; 7+ messages in thread
From: Sascha Hauer @ 2019-04-29 6:59 UTC (permalink / raw)
To: Ahmad Fatoum; +Cc: barebox, lst, sam
On Thu, Apr 25, 2019 at 04:32:28PM +0200, Ahmad Fatoum wrote:
> This series fixes a number of potential caching issues with armv7.
>
> They are:
>
> - Cortex-A7 erratum #814220
> Because of this erratum, the CPU may reorder cache maintenance operations
> when it shouldn't.
> - Wrong Cache invalidation order for Cortex-A7
> On the Cortex-A7, the L2 cache needs to be invalidated before the L1 cache.
> - Device memory isn't marked NX (Never eXecute)
> NX prevents the CPU instruction prefetcher from inadvertently
> accessing memory mapped devices
Applied, thanks
Sascha
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 7+ messages in thread