* [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution @ 2025-06-03 9:20 Fabian Pflug 2025-06-03 9:20 ` [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE Fabian Pflug 2025-06-04 10:00 ` [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution Rouven Czerwinski 0 siblings, 2 replies; 9+ messages in thread From: Fabian Pflug @ 2025-06-03 9:20 UTC (permalink / raw) To: barebox; +Cc: lst, rouven.czerwinski, Fabian Pflug, Ahmad Fatoum We expect start_optee_early() to be called with caches disabled, so there is no point in calling sync_caches_for_execution inside. Therefore let's verify the assumption that caches are indeed disabled and drop the useless call. Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de> --- arch/arm/lib32/optee-early.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c index 735d829c99..0cda0ab163 100644 --- a/arch/arm/lib32/optee-early.c +++ b/arch/arm/lib32/optee-early.c @@ -7,6 +7,7 @@ */ #include <asm/cache.h> #include <asm/setjmp.h> +#include <asm/system.h> #include <tee/optee.h> #include <debug_ll.h> #include <string.h> @@ -24,13 +25,16 @@ int start_optee_early(void *fdt, void *tee) if (ret < 0) return ret; + /* We expect this function to be called with data caches disabled */ + if (get_cr() & CR_C) + return -ENXIO; + memcpy((void *)hdr->init_load_addr_lo, tee + sizeof(*hdr), hdr->init_size); tee_start = (void *) hdr->init_load_addr_lo; /* We use setjmp/longjmp here because OP-TEE clobbers most registers */ ret = setjmp(tee_buf); if (ret == 0) { - sync_caches_for_execution(); tee_start(0, 0, fdt); longjmp(tee_buf, 1); } -- 2.39.5 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE 2025-06-03 9:20 [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution Fabian Pflug @ 2025-06-03 9:20 ` Fabian Pflug 2025-06-03 9:57 ` Lucas Stach 2025-06-04 10:00 ` [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution Rouven Czerwinski 1 sibling, 1 reply; 9+ messages in thread From: Fabian Pflug @ 2025-06-03 9:20 UTC (permalink / raw) To: barebox; +Cc: lst, rouven.czerwinski, Fabian Pflug, Ahmad Fatoum The optee-early code was initially added for i.MX6UL. Trying to naively enable it on an i.MX6Q boards was observed to cause spurious hangs on return from OP-TEE to barebox. The root cause seems to be inadequate cache handling by OP-TEE: OP-TEE enables the MMU and caches with it, but didn't take care to invalidate all cache lines before enabling the MMU, which triggered the aforementioned hangs. To paper over this issue, let's just invalidate the cache lines on the barebox side instead before jumping to OP-TEE. This issue did likely not affect the original i.MX6UL, because its Cortex-A7 has an architected L2 cache that's guaranteed zeroed (no dirty cache lines) on power-on reset, unlike the i.MX6Q's Cortex-A9, where the external L2 cache powers on with unpredictable content including the dirty bits. This means on e.g. the i.MX6UL, we will now do one extra cache invalidation that's not needed. This should be negligible and we are already had an unconditional invalidation in __barebox_arm_entry. Note that this is a different implementation than what we do on ARM64, there we load TF-A before it jumps to OP-TEE and assuming non-architected caches or caches with uninitialized content on power-on to be a dying breed, our ARM64 implementation is likely not affected. Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de> --- arch/arm/lib32/optee-early.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c index 0cda0ab163..b1dba67d42 100644 --- a/arch/arm/lib32/optee-early.c +++ b/arch/arm/lib32/optee-early.c @@ -35,6 +35,19 @@ int start_optee_early(void *fdt, void *tee) /* We use setjmp/longjmp here because OP-TEE clobbers most registers */ ret = setjmp(tee_buf); if (ret == 0) { + /* + * At least OP-TEE v4.1.0 seems to not invalidate all dirty cache + * lines before enabling the MMU. This can lead to spurious hangs + * on return to barebox on systems where there might be left-over + * dirty cache lines, whether from BootROM or because L2 cache + * is non-architected and powers on with unpredictable content + * like is the case with PL310 on i.MX6Q. + * + * Let's invalidate the caches here, so board entry points need + * not bother. + */ + arm_early_mmu_cache_invalidate(); + tee_start(0, 0, fdt); longjmp(tee_buf, 1); } -- 2.39.5 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE 2025-06-03 9:20 ` [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE Fabian Pflug @ 2025-06-03 9:57 ` Lucas Stach 2025-06-03 10:18 ` Ahmad Fatoum 0 siblings, 1 reply; 9+ messages in thread From: Lucas Stach @ 2025-06-03 9:57 UTC (permalink / raw) To: Fabian Pflug, barebox; +Cc: rouven.czerwinski, Ahmad Fatoum Hi Fabian, Am Dienstag, dem 03.06.2025 um 11:20 +0200 schrieb Fabian Pflug: > The optee-early code was initially added for i.MX6UL. Trying to naively > enable it on an i.MX6Q boards was observed to cause spurious hangs on > return from OP-TEE to barebox. > > The root cause seems to be inadequate cache handling by OP-TEE: OP-TEE > enables the MMU and caches with it, but didn't take care to invalidate > all cache lines before enabling the MMU, which triggered the > aforementioned hangs. > > To paper over this issue, let's just invalidate the cache lines on the > barebox side instead before jumping to OP-TEE. This issue did likely not > affect the original i.MX6UL, because its Cortex-A7 has an architected L2 > cache that's guaranteed zeroed (no dirty cache lines) on power-on reset, > unlike the i.MX6Q's Cortex-A9, where the external L2 cache powers on > with unpredictable content including the dirty bits. > The explanation here doesn't make too much sense to me. I don't think the outer L2 cache is even enabled at this point, but even if it were arm_early_mmu_cache_invalidate() only handles architected caches, so it wouldn't affect the PL310 on the i.MX6Q/DL. The real issue with the Cortex A9 caches is that the tags aren't cleared on power-up, so some sets/ways may end up in "valid" state if not explicitly invalidated. Thus any write to memory may get stuck in the cache, even if caching is disabled, as this knob only turns off allocation in the cache, but doesn't prevent updates of such bogus valid lines. If you then proceed to invalidate the cache, you may discard data that has not yet reached DRAM. So IMO this fix here seems risky, as it assumes that there have been no writes to memory that are worth keeping before calling start_optee_early(). While this might be the case in the current implementation, this assumption is quite non- obvious to someone just looking at the individual functions. The stuck writes are also why OP-TEE is unable to handle this itself: any cache invalidation there would risk discarding writes from software running before OP-TEE. So the only way to handle this properly is to invalidate the caches before issuing any writes. I guess it would be much better to simply have the arm_early_mmu_cache_invalidate() as part of the Cortex A9 lowlevel CPU initialization at the very start of the PBL entry. Regards, Lucas > This means on e.g. the i.MX6UL, we will now do one extra cache invalidation > that's not needed. This should be negligible and we are already had an > unconditional invalidation in __barebox_arm_entry. > > Note that this is a different implementation than what we do on ARM64, > there we load TF-A before it jumps to OP-TEE and assuming > non-architected caches or caches with uninitialized content on power-on > to be a dying breed, our ARM64 implementation is likely not affected. > > Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de> > Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> > Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de> > --- > arch/arm/lib32/optee-early.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c > index 0cda0ab163..b1dba67d42 100644 > --- a/arch/arm/lib32/optee-early.c > +++ b/arch/arm/lib32/optee-early.c > @@ -35,6 +35,19 @@ int start_optee_early(void *fdt, void *tee) > /* We use setjmp/longjmp here because OP-TEE clobbers most registers */ > ret = setjmp(tee_buf); > if (ret == 0) { > + /* > + * At least OP-TEE v4.1.0 seems to not invalidate all dirty cache > + * lines before enabling the MMU. This can lead to spurious hangs > + * on return to barebox on systems where there might be left-over > + * dirty cache lines, whether from BootROM or because L2 cache > + * is non-architected and powers on with unpredictable content > + * like is the case with PL310 on i.MX6Q. > + * > + * Let's invalidate the caches here, so board entry points need > + * not bother. > + */ > + arm_early_mmu_cache_invalidate(); > + > tee_start(0, 0, fdt); > longjmp(tee_buf, 1); > } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE 2025-06-03 9:57 ` Lucas Stach @ 2025-06-03 10:18 ` Ahmad Fatoum 2025-06-03 14:47 ` Lucas Stach 0 siblings, 1 reply; 9+ messages in thread From: Ahmad Fatoum @ 2025-06-03 10:18 UTC (permalink / raw) To: Lucas Stach, Fabian Pflug, barebox; +Cc: rouven.czerwinski Hello Lucas, On 6/3/25 11:57, Lucas Stach wrote: > Hi Fabian, > > Am Dienstag, dem 03.06.2025 um 11:20 +0200 schrieb Fabian Pflug: >> The optee-early code was initially added for i.MX6UL. Trying to naively >> enable it on an i.MX6Q boards was observed to cause spurious hangs on >> return from OP-TEE to barebox. >> >> The root cause seems to be inadequate cache handling by OP-TEE: OP-TEE >> enables the MMU and caches with it, but didn't take care to invalidate >> all cache lines before enabling the MMU, which triggered the >> aforementioned hangs. >> >> To paper over this issue, let's just invalidate the cache lines on the >> barebox side instead before jumping to OP-TEE. This issue did likely not >> affect the original i.MX6UL, because its Cortex-A7 has an architected L2 >> cache that's guaranteed zeroed (no dirty cache lines) on power-on reset, >> unlike the i.MX6Q's Cortex-A9, where the external L2 cache powers on >> with unpredictable content including the dirty bits. >> > The explanation here doesn't make too much sense to me. I don't think > the outer L2 cache is even enabled at this point, but even if it were > arm_early_mmu_cache_invalidate() only handles architected caches, so it > wouldn't affect the PL310 on the i.MX6Q/DL. You're right. I recalled issues that bit us in the past on the Cortex-A9, but not on the A7 and took a wrong turn trying to rationalize this change with a spotty recollection. > The real issue with the Cortex A9 caches is that the tags aren't > cleared on power-up, so some sets/ways may end up in "valid" state if > not explicitly invalidated. I see, thanks for the clarification. So the issue is with our handling of the L1 cache instead. > Thus any write to memory may get stuck in > the cache, even if caching is disabled, as this knob only turns off > allocation in the cache, but doesn't prevent updates of such bogus > valid lines. Ok, so if CR_C is unset, the cache is still used when reading/writing, provided that the cache line is valid. > If you then proceed to invalidate the cache, you may > discard data that has not yet reached DRAM. So IMO this fix here seems > risky, as it assumes that there have been no writes to memory that are > worth keeping before calling start_optee_early(). While this might be > the case in the current implementation, this assumption is quite non- > obvious to someone just looking at the individual functions. Agreed. If the issue is with the valid and not the dirty bit, invalidation at this location is incorrect. > The stuck writes are also why OP-TEE is unable to handle this itself: > any cache invalidation there would risk discarding writes from software > running before OP-TEE. So the only way to handle this properly is to > invalidate the caches before issuing any writes. This makes me wonder though about the regular case without any OP-TEE as we are already doing arm_early_mmu_cache_invalidate() inside __barebox_arm_entry: - low level init code writes something to handoff object or to scratch area - The freshly written data ends up in (L1) cache as tag was valid - arm_early_mmu_cache_invalidate() discards these writes - The uncompressor or barebox proper ends up with corrupted data. We don't have many objects that are accessed both before and after arm_early_mmu_cache_invalidate, so maybe that's why we didn't run into more problems? > I guess it would be much better to simply have the > arm_early_mmu_cache_invalidate() as part of the Cortex A9 lowlevel CPU > initialization at the very start of the PBL entry. We don't have a dedicated Cortex-A9 lowlevel entry function unfortunately, just some for specific processors, e.g. the imx6_cpu_lowlevel_init. We could add CONFIG_CPU_CORTEX_A9, select it from the relevant SoC options and depending on it, add the invalidation to arm_cpu_lowlevel_init()? What do you think? Thanks, Ahmad > > Regards, > Lucas > >> This means on e.g. the i.MX6UL, we will now do one extra cache invalidation >> that's not needed. This should be negligible and we are already had an >> unconditional invalidation in __barebox_arm_entry. >> >> Note that this is a different implementation than what we do on ARM64, >> there we load TF-A before it jumps to OP-TEE and assuming >> non-architected caches or caches with uninitialized content on power-on >> to be a dying breed, our ARM64 implementation is likely not affected. >> >> Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de> >> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> >> Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de> >> --- >> arch/arm/lib32/optee-early.c | 13 +++++++++++++ >> 1 file changed, 13 insertions(+) >> >> diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c >> index 0cda0ab163..b1dba67d42 100644 >> --- a/arch/arm/lib32/optee-early.c >> +++ b/arch/arm/lib32/optee-early.c >> @@ -35,6 +35,19 @@ int start_optee_early(void *fdt, void *tee) >> /* We use setjmp/longjmp here because OP-TEE clobbers most registers */ >> ret = setjmp(tee_buf); >> if (ret == 0) { >> + /* >> + * At least OP-TEE v4.1.0 seems to not invalidate all dirty cache >> + * lines before enabling the MMU. This can lead to spurious hangs >> + * on return to barebox on systems where there might be left-over >> + * dirty cache lines, whether from BootROM or because L2 cache >> + * is non-architected and powers on with unpredictable content >> + * like is the case with PL310 on i.MX6Q. >> + * >> + * Let's invalidate the caches here, so board entry points need >> + * not bother. >> + */ >> + arm_early_mmu_cache_invalidate(); >> + >> tee_start(0, 0, fdt); >> longjmp(tee_buf, 1); >> } > > -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE 2025-06-03 10:18 ` Ahmad Fatoum @ 2025-06-03 14:47 ` Lucas Stach 2025-06-03 14:51 ` Ahmad Fatoum 0 siblings, 1 reply; 9+ messages in thread From: Lucas Stach @ 2025-06-03 14:47 UTC (permalink / raw) To: Ahmad Fatoum, Fabian Pflug, barebox; +Cc: rouven.czerwinski Am Dienstag, dem 03.06.2025 um 12:18 +0200 schrieb Ahmad Fatoum: > Hello Lucas, > > On 6/3/25 11:57, Lucas Stach wrote: > > Hi Fabian, > > > > Am Dienstag, dem 03.06.2025 um 11:20 +0200 schrieb Fabian Pflug: > > > The optee-early code was initially added for i.MX6UL. Trying to naively > > > enable it on an i.MX6Q boards was observed to cause spurious hangs on > > > return from OP-TEE to barebox. > > > > > > The root cause seems to be inadequate cache handling by OP-TEE: OP-TEE > > > enables the MMU and caches with it, but didn't take care to invalidate > > > all cache lines before enabling the MMU, which triggered the > > > aforementioned hangs. > > > > > > To paper over this issue, let's just invalidate the cache lines on the > > > barebox side instead before jumping to OP-TEE. This issue did likely not > > > affect the original i.MX6UL, because its Cortex-A7 has an architected L2 > > > cache that's guaranteed zeroed (no dirty cache lines) on power-on reset, > > > unlike the i.MX6Q's Cortex-A9, where the external L2 cache powers on > > > with unpredictable content including the dirty bits. > > > > > The explanation here doesn't make too much sense to me. I don't think > > the outer L2 cache is even enabled at this point, but even if it were > > arm_early_mmu_cache_invalidate() only handles architected caches, so it > > wouldn't affect the PL310 on the i.MX6Q/DL. > > You're right. I recalled issues that bit us in the past on the > Cortex-A9, but not on the A7 and took a wrong turn trying to rationalize > this change with a spotty recollection. > > > The real issue with the Cortex A9 caches is that the tags aren't > > cleared on power-up, so some sets/ways may end up in "valid" state if > > not explicitly invalidated. > > I see, thanks for the clarification. So the issue is with our handling > of the L1 cache instead. > > > Thus any write to memory may get stuck in > > the cache, even if caching is disabled, as this knob only turns off > > allocation in the cache, but doesn't prevent updates of such bogus > > valid lines. > > Ok, so if CR_C is unset, the cache is still used when reading/writing, > provided that the cache line is valid. > Exactly. Clearing CR_C disables cache allocation, but lookups in the cache still proceed as normal. > > If you then proceed to invalidate the cache, you may > > discard data that has not yet reached DRAM. So IMO this fix here seems > > risky, as it assumes that there have been no writes to memory that are > > worth keeping before calling start_optee_early(). While this might be > > the case in the current implementation, this assumption is quite non- > > obvious to someone just looking at the individual functions. > > Agreed. If the issue is with the valid and not the dirty bit, > invalidation at this location is incorrect. > > > The stuck writes are also why OP-TEE is unable to handle this itself: > > any cache invalidation there would risk discarding writes from software > > running before OP-TEE. So the only way to handle this properly is to > > invalidate the caches before issuing any writes. > > This makes me wonder though about the regular case without any OP-TEE as > we are already doing arm_early_mmu_cache_invalidate() inside > __barebox_arm_entry: > > - low level init code writes something to handoff object or to scratch > area > > - The freshly written data ends up in (L1) cache as tag was valid > > - arm_early_mmu_cache_invalidate() discards these writes > > - The uncompressor or barebox proper ends up with corrupted data. > > We don't have many objects that are accessed both before and after > arm_early_mmu_cache_invalidate, so maybe that's why we didn't run into > more problems? > Yea, I would guess that the probability of hitting this issue with the handoff data, which isn't that big, is quite low. At least from the description above I think we can hit the same issues with the handoff data. > > I guess it would be much better to simply have the > > arm_early_mmu_cache_invalidate() as part of the Cortex A9 lowlevel CPU > > initialization at the very start of the PBL entry. > > We don't have a dedicated Cortex-A9 lowlevel entry function > unfortunately, just some for specific processors, e.g. the > imx6_cpu_lowlevel_init. > > We could add CONFIG_CPU_CORTEX_A9, select it from the relevant SoC > options and depending on it, add the invalidation to > arm_cpu_lowlevel_init()? What do you think? > This would then trigger the invalidation even on systems that don't need it in case of a multiarch Barebox. There aren't that many Cortex A9 based SoCs supported in Barebox and all of them should have a SoC specific init function to apply the necessary workarounds, so I think it would be fine to call the cache invalidate from the SoC specific lowlevel init of those few SoCs? Regards, Lucas > Thanks, > Ahmad > > > > > > Regards, > > Lucas > > > > > This means on e.g. the i.MX6UL, we will now do one extra cache invalidation > > > that's not needed. This should be negligible and we are already had an > > > unconditional invalidation in __barebox_arm_entry. > > > > > > Note that this is a different implementation than what we do on ARM64, > > > there we load TF-A before it jumps to OP-TEE and assuming > > > non-architected caches or caches with uninitialized content on power-on > > > to be a dying breed, our ARM64 implementation is likely not affected. > > > > > > Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de> > > > Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> > > > Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de> > > > --- > > > arch/arm/lib32/optee-early.c | 13 +++++++++++++ > > > 1 file changed, 13 insertions(+) > > > > > > diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c > > > index 0cda0ab163..b1dba67d42 100644 > > > --- a/arch/arm/lib32/optee-early.c > > > +++ b/arch/arm/lib32/optee-early.c > > > @@ -35,6 +35,19 @@ int start_optee_early(void *fdt, void *tee) > > > /* We use setjmp/longjmp here because OP-TEE clobbers most registers */ > > > ret = setjmp(tee_buf); > > > if (ret == 0) { > > > + /* > > > + * At least OP-TEE v4.1.0 seems to not invalidate all dirty cache > > > + * lines before enabling the MMU. This can lead to spurious hangs > > > + * on return to barebox on systems where there might be left-over > > > + * dirty cache lines, whether from BootROM or because L2 cache > > > + * is non-architected and powers on with unpredictable content > > > + * like is the case with PL310 on i.MX6Q. > > > + * > > > + * Let's invalidate the caches here, so board entry points need > > > + * not bother. > > > + */ > > > + arm_early_mmu_cache_invalidate(); > > > + > > > tee_start(0, 0, fdt); > > > longjmp(tee_buf, 1); > > > } > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE 2025-06-03 14:47 ` Lucas Stach @ 2025-06-03 14:51 ` Ahmad Fatoum 2025-06-03 15:20 ` Lucas Stach 0 siblings, 1 reply; 9+ messages in thread From: Ahmad Fatoum @ 2025-06-03 14:51 UTC (permalink / raw) To: Lucas Stach, Fabian Pflug, barebox; +Cc: rouven.czerwinski Hello Lucas, On 6/3/25 16:47, Lucas Stach wrote: > Am Dienstag, dem 03.06.2025 um 12:18 +0200 schrieb Ahmad Fatoum: >> On 6/3/25 11:57, Lucas Stach wrote: >> Ok, so if CR_C is unset, the cache is still used when reading/writing, >> provided that the cache line is valid. >> > Exactly. Clearing CR_C disables cache allocation, but lookups in the > cache still proceed as normal. I hope to remember that this time. I am fairly sure you explained this to me once before already... >> We don't have many objects that are accessed both before and after >> arm_early_mmu_cache_invalidate, so maybe that's why we didn't run into >> more problems? >> > Yea, I would guess that the probability of hitting this issue with the > handoff data, which isn't that big, is quite low. At least from the > description above I think we can hit the same issues with the handoff > data. Ack. >>> I guess it would be much better to simply have the >>> arm_early_mmu_cache_invalidate() as part of the Cortex A9 lowlevel CPU >>> initialization at the very start of the PBL entry. >> >> We don't have a dedicated Cortex-A9 lowlevel entry function >> unfortunately, just some for specific processors, e.g. the >> imx6_cpu_lowlevel_init. >> >> We could add CONFIG_CPU_CORTEX_A9, select it from the relevant SoC >> options and depending on it, add the invalidation to >> arm_cpu_lowlevel_init()? What do you think? >> > This would then trigger the invalidation even on systems that don't > need it in case of a multiarch Barebox. There aren't that many Cortex > A9 based SoCs supported in Barebox and all of them should have a SoC > specific init function to apply the necessary workarounds, so I think > it would be fine to call the cache invalidate from the SoC specific > lowlevel init of those few SoCs? Fair enough. How do we know we only need this for Cortex-A9 though? Couldn't e.g. the Cortex-A8 also be affected? Cheers, Ahmad > > Regards, > Lucas > >> Thanks, >> Ahmad >> >> >>> >>> Regards, >>> Lucas >>> >>>> This means on e.g. the i.MX6UL, we will now do one extra cache invalidation >>>> that's not needed. This should be negligible and we are already had an >>>> unconditional invalidation in __barebox_arm_entry. >>>> >>>> Note that this is a different implementation than what we do on ARM64, >>>> there we load TF-A before it jumps to OP-TEE and assuming >>>> non-architected caches or caches with uninitialized content on power-on >>>> to be a dying breed, our ARM64 implementation is likely not affected. >>>> >>>> Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de> >>>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> >>>> Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de> >>>> --- >>>> arch/arm/lib32/optee-early.c | 13 +++++++++++++ >>>> 1 file changed, 13 insertions(+) >>>> >>>> diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c >>>> index 0cda0ab163..b1dba67d42 100644 >>>> --- a/arch/arm/lib32/optee-early.c >>>> +++ b/arch/arm/lib32/optee-early.c >>>> @@ -35,6 +35,19 @@ int start_optee_early(void *fdt, void *tee) >>>> /* We use setjmp/longjmp here because OP-TEE clobbers most registers */ >>>> ret = setjmp(tee_buf); >>>> if (ret == 0) { >>>> + /* >>>> + * At least OP-TEE v4.1.0 seems to not invalidate all dirty cache >>>> + * lines before enabling the MMU. This can lead to spurious hangs >>>> + * on return to barebox on systems where there might be left-over >>>> + * dirty cache lines, whether from BootROM or because L2 cache >>>> + * is non-architected and powers on with unpredictable content >>>> + * like is the case with PL310 on i.MX6Q. >>>> + * >>>> + * Let's invalidate the caches here, so board entry points need >>>> + * not bother. >>>> + */ >>>> + arm_early_mmu_cache_invalidate(); >>>> + >>>> tee_start(0, 0, fdt); >>>> longjmp(tee_buf, 1); >>>> } >>> >>> >> > > -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE 2025-06-03 14:51 ` Ahmad Fatoum @ 2025-06-03 15:20 ` Lucas Stach 2025-06-04 9:57 ` Rouven Czerwinski 0 siblings, 1 reply; 9+ messages in thread From: Lucas Stach @ 2025-06-03 15:20 UTC (permalink / raw) To: Ahmad Fatoum, Fabian Pflug, barebox; +Cc: rouven.czerwinski Am Dienstag, dem 03.06.2025 um 16:51 +0200 schrieb Ahmad Fatoum: [...] > > > > I guess it would be much better to simply have the > > > > arm_early_mmu_cache_invalidate() as part of the Cortex A9 lowlevel CPU > > > > initialization at the very start of the PBL entry. > > > > > > We don't have a dedicated Cortex-A9 lowlevel entry function > > > unfortunately, just some for specific processors, e.g. the > > > imx6_cpu_lowlevel_init. > > > > > > We could add CONFIG_CPU_CORTEX_A9, select it from the relevant SoC > > > options and depending on it, add the invalidation to > > > arm_cpu_lowlevel_init()? What do you think? > > > > > This would then trigger the invalidation even on systems that don't > > need it in case of a multiarch Barebox. There aren't that many Cortex > > A9 based SoCs supported in Barebox and all of them should have a SoC > > specific init function to apply the necessary workarounds, so I think > > it would be fine to call the cache invalidate from the SoC specific > > lowlevel init of those few SoCs? > > Fair enough. How do we know we only need this for Cortex-A9 though? > Couldn't e.g. the Cortex-A8 also be affected? We can't be 100% sure without specific knowledge about each SoC integration. Both the Cortex A8 [1] and Cortex A15 [2] TRMs define a reset sequence that mandates the straps to be set in such a way that the processor will clear all L1 and L2 memory arrays on power-on reset. The only odd one where the TRM doesn't even mention memory arrays in the reset sequence is the Cortex A9 [3], which pretty much lines up with the number of SoCs where we have seen issues due to uninitialized cache content. Regards, Lucas [1] https://developer.arm.com/documentation/ddi0344/k/Cihcbcgi [2] https://developer.arm.com/documentation/ddi0438/i/functional-description/clocking-and-resets/resets [3] https://developer.arm.com/documentation/100511/0401/functional-description/clocking-and-resets/reset ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE 2025-06-03 15:20 ` Lucas Stach @ 2025-06-04 9:57 ` Rouven Czerwinski 0 siblings, 0 replies; 9+ messages in thread From: Rouven Czerwinski @ 2025-06-04 9:57 UTC (permalink / raw) To: Lucas Stach; +Cc: Ahmad Fatoum, Fabian Pflug, barebox Hi, On Tue, 3 Jun 2025 at 17:20, Lucas Stach <l.stach@pengutronix.de> wrote: > > Am Dienstag, dem 03.06.2025 um 16:51 +0200 schrieb Ahmad Fatoum: > [...] > > > > > I guess it would be much better to simply have the > > > > > arm_early_mmu_cache_invalidate() as part of the Cortex A9 lowlevel CPU > > > > > initialization at the very start of the PBL entry. To add my few cents: I have had customer systems where without cache invalidation, imx-usb-loader would sporadically hang. This hang was fixed by calling arm_early_mmu_cache_invalidate() in the lowlevel pbl code as well. Barebox already has i.MX6 boards which to call the invalidation early in PBL, but unfortunately none have a comment or commit description why. > > > > > > > > We don't have a dedicated Cortex-A9 lowlevel entry function > > > > unfortunately, just some for specific processors, e.g. the > > > > imx6_cpu_lowlevel_init. > > > > > > > > We could add CONFIG_CPU_CORTEX_A9, select it from the relevant SoC > > > > options and depending on it, add the invalidation to > > > > arm_cpu_lowlevel_init()? What do you think? > > > > > > > This would then trigger the invalidation even on systems that don't > > > need it in case of a multiarch Barebox. There aren't that many Cortex > > > A9 based SoCs supported in Barebox and all of them should have a SoC > > > specific init function to apply the necessary workarounds, so I think > > > it would be fine to call the cache invalidate from the SoC specific > > > lowlevel init of those few SoCs? > > > > Fair enough. How do we know we only need this for Cortex-A9 though? > > Couldn't e.g. the Cortex-A8 also be affected? > > We can't be 100% sure without specific knowledge about each SoC > integration. Both the Cortex A8 [1] and Cortex A15 [2] TRMs define a > reset sequence that mandates the straps to be set in such a way that > the processor will clear all L1 and L2 memory arrays on power-on reset. > > The only odd one where the TRM doesn't even mention memory arrays in > the reset sequence is the Cortex A9 [3], which pretty much lines up > with the number of SoCs where we have seen issues due to uninitialized > cache content. > > Regards, > Lucas > > [1] https://developer.arm.com/documentation/ddi0344/k/Cihcbcgi > [2] https://developer.arm.com/documentation/ddi0438/i/functional-description/clocking-and-resets/resets > [3] https://developer.arm.com/documentation/100511/0401/functional-description/clocking-and-resets/reset Adding this to imx6_cpu_lowlevel_init() is IMO the correct way as well. - Rouven ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution 2025-06-03 9:20 [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution Fabian Pflug 2025-06-03 9:20 ` [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE Fabian Pflug @ 2025-06-04 10:00 ` Rouven Czerwinski 1 sibling, 0 replies; 9+ messages in thread From: Rouven Czerwinski @ 2025-06-04 10:00 UTC (permalink / raw) To: Fabian Pflug; +Cc: barebox, lst, Ahmad Fatoum Hi, On Tue, 3 Jun 2025 at 11:20, Fabian Pflug <f.pflug@pengutronix.de> wrote: > > We expect start_optee_early() to be called with caches disabled, so there > is no point in calling sync_caches_for_execution inside. > > Therefore let's verify the assumption that caches are indeed disabled > and drop the useless call. > > Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de> > Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> > Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de> > --- > arch/arm/lib32/optee-early.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c > index 735d829c99..0cda0ab163 100644 > --- a/arch/arm/lib32/optee-early.c > +++ b/arch/arm/lib32/optee-early.c > @@ -7,6 +7,7 @@ > */ > #include <asm/cache.h> > #include <asm/setjmp.h> > +#include <asm/system.h> > #include <tee/optee.h> > #include <debug_ll.h> > #include <string.h> > @@ -24,13 +25,16 @@ int start_optee_early(void *fdt, void *tee) > if (ret < 0) > return ret; > > + /* We expect this function to be called with data caches disabled */ > + if (get_cr() & CR_C) > + return -ENXIO; > + I don't think returning an error here is correct, we should probably just panic() with a useful error message. None of the current callers of the function check the return code, so at best the board starts without OP-TEE, at worst it randomly hangs because no PSCI provider can be found. > memcpy((void *)hdr->init_load_addr_lo, tee + sizeof(*hdr), hdr->init_size); > tee_start = (void *) hdr->init_load_addr_lo; > > /* We use setjmp/longjmp here because OP-TEE clobbers most registers */ > ret = setjmp(tee_buf); > if (ret == 0) { > - sync_caches_for_execution(); > tee_start(0, 0, fdt); > longjmp(tee_buf, 1); > } > -- > 2.39.5 > - Rouven ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-06-04 10:01 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-06-03 9:20 [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution Fabian Pflug 2025-06-03 9:20 ` [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE Fabian Pflug 2025-06-03 9:57 ` Lucas Stach 2025-06-03 10:18 ` Ahmad Fatoum 2025-06-03 14:47 ` Lucas Stach 2025-06-03 14:51 ` Ahmad Fatoum 2025-06-03 15:20 ` Lucas Stach 2025-06-04 9:57 ` Rouven Czerwinski 2025-06-04 10:00 ` [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution Rouven Czerwinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox