[PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for

mail archive of the barebox mailing list
 help / color / mirror / Atom feed

* [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution
@ 2025-06-03  9:20 Fabian Pflug
  2025-06-03  9:20 ` [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE Fabian Pflug
  2025-06-04 10:00 ` [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution Rouven Czerwinski
  0 siblings, 2 replies; 9+ messages in thread
From: Fabian Pflug @ 2025-06-03  9:20 UTC (permalink / raw)
  To: barebox; +Cc: lst, rouven.czerwinski, Fabian Pflug, Ahmad Fatoum

We expect start_optee_early() to be called with caches disabled, so there
is no point in calling sync_caches_for_execution inside.

Therefore let's verify the assumption that caches are indeed disabled
and drop the useless call.

Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de>
---
 arch/arm/lib32/optee-early.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c
index 735d829c99..0cda0ab163 100644
--- a/arch/arm/lib32/optee-early.c
+++ b/arch/arm/lib32/optee-early.c
@@ -7,6 +7,7 @@
  */
 #include <asm/cache.h>
 #include <asm/setjmp.h>
+#include <asm/system.h>
 #include <tee/optee.h>
 #include <debug_ll.h>
 #include <string.h>
@@ -24,13 +25,16 @@ int start_optee_early(void *fdt, void *tee)
 	if (ret < 0)
 		return ret;
 
+	/* We expect this function to be called with data caches disabled */
+	if (get_cr() & CR_C)
+		return -ENXIO;
+
 	memcpy((void *)hdr->init_load_addr_lo, tee + sizeof(*hdr), hdr->init_size);
 	tee_start = (void *) hdr->init_load_addr_lo;
 
 	/* We use setjmp/longjmp here because OP-TEE clobbers most registers */
 	ret = setjmp(tee_buf);
 	if (ret == 0) {
-		sync_caches_for_execution();
 		tee_start(0, 0, fdt);
 		longjmp(tee_buf, 1);
 	}
-- 
2.39.5




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE
  2025-06-03  9:20 [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution Fabian Pflug
@ 2025-06-03  9:20 ` Fabian Pflug
  2025-06-03  9:57   ` Lucas Stach
  2025-06-04 10:00 ` [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution Rouven Czerwinski
  1 sibling, 1 reply; 9+ messages in thread
From: Fabian Pflug @ 2025-06-03  9:20 UTC (permalink / raw)
  To: barebox; +Cc: lst, rouven.czerwinski, Fabian Pflug, Ahmad Fatoum

The optee-early code was initially added for i.MX6UL. Trying to naively
enable it on an i.MX6Q boards was observed to cause spurious hangs on
return from OP-TEE to barebox.

The root cause seems to be inadequate cache handling by OP-TEE: OP-TEE
enables the MMU and caches with it, but didn't take care to invalidate
all cache lines before enabling the MMU, which triggered the
aforementioned hangs.

To paper over this issue, let's just invalidate the cache lines on the
barebox side instead before jumping to OP-TEE. This issue did likely not
affect the original i.MX6UL, because its Cortex-A7 has an architected L2
cache that's guaranteed zeroed (no dirty cache lines) on power-on reset,
unlike the i.MX6Q's Cortex-A9, where the external L2 cache powers on
with unpredictable content including the dirty bits.

This means on e.g. the i.MX6UL, we will now do one extra cache invalidation
that's not needed. This should be negligible and we are already had an
unconditional invalidation in __barebox_arm_entry.

Note that this is a different implementation than what we do on ARM64,
there we load TF-A before it jumps to OP-TEE and assuming
non-architected caches or caches with uninitialized content on power-on
to be a dying breed, our ARM64 implementation is likely not affected.

Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de>
---
 arch/arm/lib32/optee-early.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c
index 0cda0ab163..b1dba67d42 100644
--- a/arch/arm/lib32/optee-early.c
+++ b/arch/arm/lib32/optee-early.c
@@ -35,6 +35,19 @@ int start_optee_early(void *fdt, void *tee)
 	/* We use setjmp/longjmp here because OP-TEE clobbers most registers */
 	ret = setjmp(tee_buf);
 	if (ret == 0) {
+		/*
+		 * At least OP-TEE v4.1.0 seems to not invalidate all dirty cache
+		 * lines before enabling the MMU. This can lead to spurious hangs
+		 * on return to barebox on systems where there might be left-over
+		 * dirty cache lines, whether from BootROM or because L2 cache
+		 * is non-architected and powers on with unpredictable content
+		 * like is the case with PL310 on i.MX6Q.
+		 *
+		 * Let's invalidate the caches here, so board entry points need
+		 * not bother.
+		 */
+		arm_early_mmu_cache_invalidate();
+
 		tee_start(0, 0, fdt);
 		longjmp(tee_buf, 1);
 	}
-- 
2.39.5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE
  2025-06-03  9:20 ` [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE Fabian Pflug
@ 2025-06-03  9:57   ` Lucas Stach
  2025-06-03 10:18     ` Ahmad Fatoum
  0 siblings, 1 reply; 9+ messages in thread
From: Lucas Stach @ 2025-06-03  9:57 UTC (permalink / raw)
  To: Fabian Pflug, barebox; +Cc: rouven.czerwinski, Ahmad Fatoum

Hi Fabian,

Am Dienstag, dem 03.06.2025 um 11:20 +0200 schrieb Fabian Pflug:
> The optee-early code was initially added for i.MX6UL. Trying to naively
> enable it on an i.MX6Q boards was observed to cause spurious hangs on
> return from OP-TEE to barebox.
> 
> The root cause seems to be inadequate cache handling by OP-TEE: OP-TEE
> enables the MMU and caches with it, but didn't take care to invalidate
> all cache lines before enabling the MMU, which triggered the
> aforementioned hangs.
> 
> To paper over this issue, let's just invalidate the cache lines on the
> barebox side instead before jumping to OP-TEE. This issue did likely not
> affect the original i.MX6UL, because its Cortex-A7 has an architected L2
> cache that's guaranteed zeroed (no dirty cache lines) on power-on reset,
> unlike the i.MX6Q's Cortex-A9, where the external L2 cache powers on
> with unpredictable content including the dirty bits.
> 
The explanation here doesn't make too much sense to me. I don't think
the outer L2 cache is even enabled at this point, but even if it were
arm_early_mmu_cache_invalidate() only handles architected caches, so it
wouldn't affect the PL310 on the i.MX6Q/DL.

The real issue with the Cortex A9 caches is that the tags aren't
cleared on power-up, so some sets/ways may end up in "valid" state if
not explicitly invalidated. Thus any write to memory may get stuck in
the cache, even if caching is disabled, as this knob only turns off 
allocation in the cache, but doesn't prevent updates of such bogus
valid lines. If you then proceed to invalidate the cache, you may
discard data that has not yet reached DRAM. So IMO this fix here seems
risky, as it assumes that there have been no writes to memory that are
worth keeping before calling start_optee_early(). While this might be
the case in the current implementation, this assumption is quite non-
obvious to someone just looking at the individual functions.

The stuck writes are also why OP-TEE is unable to handle this itself:
any cache invalidation there would risk discarding writes from software
running before OP-TEE. So the only way to handle this properly is to
invalidate the caches before issuing any writes.

I guess it would be much better to simply have the
arm_early_mmu_cache_invalidate() as part of the Cortex A9 lowlevel CPU
initialization at the very start of the PBL entry.

Regards,
Lucas

> This means on e.g. the i.MX6UL, we will now do one extra cache invalidation
> that's not needed. This should be negligible and we are already had an
> unconditional invalidation in __barebox_arm_entry.
> 
> Note that this is a different implementation than what we do on ARM64,
> there we load TF-A before it jumps to OP-TEE and assuming
> non-architected caches or caches with uninitialized content on power-on
> to be a dying breed, our ARM64 implementation is likely not affected.
> 
> Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de>
> ---
>  arch/arm/lib32/optee-early.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c
> index 0cda0ab163..b1dba67d42 100644
> --- a/arch/arm/lib32/optee-early.c
> +++ b/arch/arm/lib32/optee-early.c
> @@ -35,6 +35,19 @@ int start_optee_early(void *fdt, void *tee)
>  	/* We use setjmp/longjmp here because OP-TEE clobbers most registers */
>  	ret = setjmp(tee_buf);
>  	if (ret == 0) {
> +		/*
> +		 * At least OP-TEE v4.1.0 seems to not invalidate all dirty cache
> +		 * lines before enabling the MMU. This can lead to spurious hangs
> +		 * on return to barebox on systems where there might be left-over
> +		 * dirty cache lines, whether from BootROM or because L2 cache
> +		 * is non-architected and powers on with unpredictable content
> +		 * like is the case with PL310 on i.MX6Q.
> +		 *
> +		 * Let's invalidate the caches here, so board entry points need
> +		 * not bother.
> +		 */
> +		arm_early_mmu_cache_invalidate();
> +
>  		tee_start(0, 0, fdt);
>  		longjmp(tee_buf, 1);
>  	}




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE
  2025-06-03  9:57   ` Lucas Stach
@ 2025-06-03 10:18     ` Ahmad Fatoum
  2025-06-03 14:47       ` Lucas Stach
  0 siblings, 1 reply; 9+ messages in thread
From: Ahmad Fatoum @ 2025-06-03 10:18 UTC (permalink / raw)
  To: Lucas Stach, Fabian Pflug, barebox; +Cc: rouven.czerwinski

Hello Lucas,

On 6/3/25 11:57, Lucas Stach wrote:
> Hi Fabian,
> 
> Am Dienstag, dem 03.06.2025 um 11:20 +0200 schrieb Fabian Pflug:
>> The optee-early code was initially added for i.MX6UL. Trying to naively
>> enable it on an i.MX6Q boards was observed to cause spurious hangs on
>> return from OP-TEE to barebox.
>>
>> The root cause seems to be inadequate cache handling by OP-TEE: OP-TEE
>> enables the MMU and caches with it, but didn't take care to invalidate
>> all cache lines before enabling the MMU, which triggered the
>> aforementioned hangs.
>>
>> To paper over this issue, let's just invalidate the cache lines on the
>> barebox side instead before jumping to OP-TEE. This issue did likely not
>> affect the original i.MX6UL, because its Cortex-A7 has an architected L2
>> cache that's guaranteed zeroed (no dirty cache lines) on power-on reset,
>> unlike the i.MX6Q's Cortex-A9, where the external L2 cache powers on
>> with unpredictable content including the dirty bits.
>>
> The explanation here doesn't make too much sense to me. I don't think
> the outer L2 cache is even enabled at this point, but even if it were
> arm_early_mmu_cache_invalidate() only handles architected caches, so it
> wouldn't affect the PL310 on the i.MX6Q/DL.

You're right. I recalled issues that bit us in the past on the
Cortex-A9, but not on the A7 and took a wrong turn trying to rationalize
this change with a spotty recollection.

> The real issue with the Cortex A9 caches is that the tags aren't
> cleared on power-up, so some sets/ways may end up in "valid" state if
> not explicitly invalidated.

I see, thanks for the clarification. So the issue is with our handling
of the L1 cache instead.

> Thus any write to memory may get stuck in
> the cache, even if caching is disabled, as this knob only turns off 
> allocation in the cache, but doesn't prevent updates of such bogus
> valid lines.

Ok, so if CR_C is unset, the cache is still used when reading/writing,
provided that the cache line is valid.

> If you then proceed to invalidate the cache, you may
> discard data that has not yet reached DRAM. So IMO this fix here seems
> risky, as it assumes that there have been no writes to memory that are
> worth keeping before calling start_optee_early(). While this might be
> the case in the current implementation, this assumption is quite non-
> obvious to someone just looking at the individual functions.

Agreed. If the issue is with the valid and not the dirty bit,
invalidation at this location is incorrect.

> The stuck writes are also why OP-TEE is unable to handle this itself:
> any cache invalidation there would risk discarding writes from software
> running before OP-TEE. So the only way to handle this properly is to
> invalidate the caches before issuing any writes.

This makes me wonder though about the regular case without any OP-TEE as
we are already doing arm_early_mmu_cache_invalidate() inside
__barebox_arm_entry:

 - low level init code writes something to handoff object or to scratch
   area

 - The freshly written data ends up in (L1) cache as tag was valid

 - arm_early_mmu_cache_invalidate() discards these writes

 - The uncompressor or barebox proper ends up with corrupted data.

We don't have many objects that are accessed both before and after
arm_early_mmu_cache_invalidate, so maybe that's why we didn't run into
more problems?

> I guess it would be much better to simply have the
> arm_early_mmu_cache_invalidate() as part of the Cortex A9 lowlevel CPU
> initialization at the very start of the PBL entry.

We don't have a dedicated Cortex-A9 lowlevel entry function
unfortunately, just some for specific processors, e.g. the
imx6_cpu_lowlevel_init.

We could add CONFIG_CPU_CORTEX_A9, select it from the relevant SoC
options and depending on it, add the invalidation to
arm_cpu_lowlevel_init()? What do you think?

Thanks,
Ahmad


> 
> Regards,
> Lucas
> 
>> This means on e.g. the i.MX6UL, we will now do one extra cache invalidation
>> that's not needed. This should be negligible and we are already had an
>> unconditional invalidation in __barebox_arm_entry.
>>
>> Note that this is a different implementation than what we do on ARM64,
>> there we load TF-A before it jumps to OP-TEE and assuming
>> non-architected caches or caches with uninitialized content on power-on
>> to be a dying breed, our ARM64 implementation is likely not affected.
>>
>> Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>> Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de>
>> ---
>>  arch/arm/lib32/optee-early.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c
>> index 0cda0ab163..b1dba67d42 100644
>> --- a/arch/arm/lib32/optee-early.c
>> +++ b/arch/arm/lib32/optee-early.c
>> @@ -35,6 +35,19 @@ int start_optee_early(void *fdt, void *tee)
>>  	/* We use setjmp/longjmp here because OP-TEE clobbers most registers */
>>  	ret = setjmp(tee_buf);
>>  	if (ret == 0) {
>> +		/*
>> +		 * At least OP-TEE v4.1.0 seems to not invalidate all dirty cache
>> +		 * lines before enabling the MMU. This can lead to spurious hangs
>> +		 * on return to barebox on systems where there might be left-over
>> +		 * dirty cache lines, whether from BootROM or because L2 cache
>> +		 * is non-architected and powers on with unpredictable content
>> +		 * like is the case with PL310 on i.MX6Q.
>> +		 *
>> +		 * Let's invalidate the caches here, so board entry points need
>> +		 * not bother.
>> +		 */
>> +		arm_early_mmu_cache_invalidate();
>> +
>>  		tee_start(0, 0, fdt);
>>  		longjmp(tee_buf, 1);
>>  	}
> 
> 

-- 
Pengutronix e.K.                  |                             |
Steuerwalder Str. 21              | http://www.pengutronix.de/  |
31137 Hildesheim, Germany         | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686  | Fax:   +49-5121-206917-5555 |




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE
  2025-06-03 10:18     ` Ahmad Fatoum
@ 2025-06-03 14:47       ` Lucas Stach
  2025-06-03 14:51         ` Ahmad Fatoum
  0 siblings, 1 reply; 9+ messages in thread
From: Lucas Stach @ 2025-06-03 14:47 UTC (permalink / raw)
  To: Ahmad Fatoum, Fabian Pflug, barebox; +Cc: rouven.czerwinski

Am Dienstag, dem 03.06.2025 um 12:18 +0200 schrieb Ahmad Fatoum:
> Hello Lucas,
> 
> On 6/3/25 11:57, Lucas Stach wrote:
> > Hi Fabian,
> > 
> > Am Dienstag, dem 03.06.2025 um 11:20 +0200 schrieb Fabian Pflug:
> > > The optee-early code was initially added for i.MX6UL. Trying to naively
> > > enable it on an i.MX6Q boards was observed to cause spurious hangs on
> > > return from OP-TEE to barebox.
> > > 
> > > The root cause seems to be inadequate cache handling by OP-TEE: OP-TEE
> > > enables the MMU and caches with it, but didn't take care to invalidate
> > > all cache lines before enabling the MMU, which triggered the
> > > aforementioned hangs.
> > > 
> > > To paper over this issue, let's just invalidate the cache lines on the
> > > barebox side instead before jumping to OP-TEE. This issue did likely not
> > > affect the original i.MX6UL, because its Cortex-A7 has an architected L2
> > > cache that's guaranteed zeroed (no dirty cache lines) on power-on reset,
> > > unlike the i.MX6Q's Cortex-A9, where the external L2 cache powers on
> > > with unpredictable content including the dirty bits.
> > > 
> > The explanation here doesn't make too much sense to me. I don't think
> > the outer L2 cache is even enabled at this point, but even if it were
> > arm_early_mmu_cache_invalidate() only handles architected caches, so it
> > wouldn't affect the PL310 on the i.MX6Q/DL.
> 
> You're right. I recalled issues that bit us in the past on the
> Cortex-A9, but not on the A7 and took a wrong turn trying to rationalize
> this change with a spotty recollection.
> 
> > The real issue with the Cortex A9 caches is that the tags aren't
> > cleared on power-up, so some sets/ways may end up in "valid" state if
> > not explicitly invalidated.
> 
> I see, thanks for the clarification. So the issue is with our handling
> of the L1 cache instead.
> 
> > Thus any write to memory may get stuck in
> > the cache, even if caching is disabled, as this knob only turns off 
> > allocation in the cache, but doesn't prevent updates of such bogus
> > valid lines.
> 
> Ok, so if CR_C is unset, the cache is still used when reading/writing,
> provided that the cache line is valid.
> 
Exactly. Clearing CR_C disables cache allocation, but lookups in the
cache still proceed as normal.

> > If you then proceed to invalidate the cache, you may
> > discard data that has not yet reached DRAM. So IMO this fix here seems
> > risky, as it assumes that there have been no writes to memory that are
> > worth keeping before calling start_optee_early(). While this might be
> > the case in the current implementation, this assumption is quite non-
> > obvious to someone just looking at the individual functions.
> 
> Agreed. If the issue is with the valid and not the dirty bit,
> invalidation at this location is incorrect.
> 
> > The stuck writes are also why OP-TEE is unable to handle this itself:
> > any cache invalidation there would risk discarding writes from software
> > running before OP-TEE. So the only way to handle this properly is to
> > invalidate the caches before issuing any writes.
> 
> This makes me wonder though about the regular case without any OP-TEE as
> we are already doing arm_early_mmu_cache_invalidate() inside
> __barebox_arm_entry:
> 
>  - low level init code writes something to handoff object or to scratch
>    area
> 
>  - The freshly written data ends up in (L1) cache as tag was valid
> 
>  - arm_early_mmu_cache_invalidate() discards these writes
> 
>  - The uncompressor or barebox proper ends up with corrupted data.
> 
> We don't have many objects that are accessed both before and after
> arm_early_mmu_cache_invalidate, so maybe that's why we didn't run into
> more problems?
> 
Yea, I would guess that the probability of hitting this issue with the
handoff data, which isn't that big, is quite low. At least from the
description above I think we can hit the same issues with the handoff
data.

> > I guess it would be much better to simply have the
> > arm_early_mmu_cache_invalidate() as part of the Cortex A9 lowlevel CPU
> > initialization at the very start of the PBL entry.
> 
> We don't have a dedicated Cortex-A9 lowlevel entry function
> unfortunately, just some for specific processors, e.g. the
> imx6_cpu_lowlevel_init.
> 
> We could add CONFIG_CPU_CORTEX_A9, select it from the relevant SoC
> options and depending on it, add the invalidation to
> arm_cpu_lowlevel_init()? What do you think?
> 
This would then trigger the invalidation even on systems that don't
need it in case of a multiarch Barebox. There aren't that many Cortex
A9 based SoCs supported in Barebox and all of them should have a SoC
specific init function to apply the necessary workarounds, so I think
it would be fine to call the cache invalidate from the SoC specific
lowlevel init of those few SoCs?

Regards,
Lucas

> Thanks,
> Ahmad
> 
> 
> > 
> > Regards,
> > Lucas
> > 
> > > This means on e.g. the i.MX6UL, we will now do one extra cache invalidation
> > > that's not needed. This should be negligible and we are already had an
> > > unconditional invalidation in __barebox_arm_entry.
> > > 
> > > Note that this is a different implementation than what we do on ARM64,
> > > there we load TF-A before it jumps to OP-TEE and assuming
> > > non-architected caches or caches with uninitialized content on power-on
> > > to be a dying breed, our ARM64 implementation is likely not affected.
> > > 
> > > Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> > > Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> > > Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de>
> > > ---
> > >  arch/arm/lib32/optee-early.c | 13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > > 
> > > diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c
> > > index 0cda0ab163..b1dba67d42 100644
> > > --- a/arch/arm/lib32/optee-early.c
> > > +++ b/arch/arm/lib32/optee-early.c
> > > @@ -35,6 +35,19 @@ int start_optee_early(void *fdt, void *tee)
> > >  	/* We use setjmp/longjmp here because OP-TEE clobbers most registers */
> > >  	ret = setjmp(tee_buf);
> > >  	if (ret == 0) {
> > > +		/*
> > > +		 * At least OP-TEE v4.1.0 seems to not invalidate all dirty cache
> > > +		 * lines before enabling the MMU. This can lead to spurious hangs
> > > +		 * on return to barebox on systems where there might be left-over
> > > +		 * dirty cache lines, whether from BootROM or because L2 cache
> > > +		 * is non-architected and powers on with unpredictable content
> > > +		 * like is the case with PL310 on i.MX6Q.
> > > +		 *
> > > +		 * Let's invalidate the caches here, so board entry points need
> > > +		 * not bother.
> > > +		 */
> > > +		arm_early_mmu_cache_invalidate();
> > > +
> > >  		tee_start(0, 0, fdt);
> > >  		longjmp(tee_buf, 1);
> > >  	}
> > 
> > 
> 




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE
  2025-06-03 14:47       ` Lucas Stach
@ 2025-06-03 14:51         ` Ahmad Fatoum
  2025-06-03 15:20           ` Lucas Stach
  0 siblings, 1 reply; 9+ messages in thread
From: Ahmad Fatoum @ 2025-06-03 14:51 UTC (permalink / raw)
  To: Lucas Stach, Fabian Pflug, barebox; +Cc: rouven.czerwinski

Hello Lucas,

On 6/3/25 16:47, Lucas Stach wrote:
> Am Dienstag, dem 03.06.2025 um 12:18 +0200 schrieb Ahmad Fatoum:
>> On 6/3/25 11:57, Lucas Stach wrote:
>> Ok, so if CR_C is unset, the cache is still used when reading/writing,
>> provided that the cache line is valid.
>>
> Exactly. Clearing CR_C disables cache allocation, but lookups in the
> cache still proceed as normal.

I hope to remember that this time. I am fairly sure you explained this
to me once before already...

>> We don't have many objects that are accessed both before and after
>> arm_early_mmu_cache_invalidate, so maybe that's why we didn't run into
>> more problems?
>>
> Yea, I would guess that the probability of hitting this issue with the
> handoff data, which isn't that big, is quite low. At least from the
> description above I think we can hit the same issues with the handoff
> data.

Ack.

>>> I guess it would be much better to simply have the
>>> arm_early_mmu_cache_invalidate() as part of the Cortex A9 lowlevel CPU
>>> initialization at the very start of the PBL entry.
>>
>> We don't have a dedicated Cortex-A9 lowlevel entry function
>> unfortunately, just some for specific processors, e.g. the
>> imx6_cpu_lowlevel_init.
>>
>> We could add CONFIG_CPU_CORTEX_A9, select it from the relevant SoC
>> options and depending on it, add the invalidation to
>> arm_cpu_lowlevel_init()? What do you think?
>>
> This would then trigger the invalidation even on systems that don't
> need it in case of a multiarch Barebox. There aren't that many Cortex
> A9 based SoCs supported in Barebox and all of them should have a SoC
> specific init function to apply the necessary workarounds, so I think
> it would be fine to call the cache invalidate from the SoC specific
> lowlevel init of those few SoCs?

Fair enough. How do we know we only need this for Cortex-A9 though?
Couldn't e.g. the Cortex-A8 also be affected?

Cheers,
Ahmad

> 
> Regards,
> Lucas
> 
>> Thanks,
>> Ahmad
>>
>>
>>>
>>> Regards,
>>> Lucas
>>>
>>>> This means on e.g. the i.MX6UL, we will now do one extra cache invalidation
>>>> that's not needed. This should be negligible and we are already had an
>>>> unconditional invalidation in __barebox_arm_entry.
>>>>
>>>> Note that this is a different implementation than what we do on ARM64,
>>>> there we load TF-A before it jumps to OP-TEE and assuming
>>>> non-architected caches or caches with uninitialized content on power-on
>>>> to be a dying breed, our ARM64 implementation is likely not affected.
>>>>
>>>> Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>>>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>>>> Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de>
>>>> ---
>>>>  arch/arm/lib32/optee-early.c | 13 +++++++++++++
>>>>  1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c
>>>> index 0cda0ab163..b1dba67d42 100644
>>>> --- a/arch/arm/lib32/optee-early.c
>>>> +++ b/arch/arm/lib32/optee-early.c
>>>> @@ -35,6 +35,19 @@ int start_optee_early(void *fdt, void *tee)
>>>>  	/* We use setjmp/longjmp here because OP-TEE clobbers most registers */
>>>>  	ret = setjmp(tee_buf);
>>>>  	if (ret == 0) {
>>>> +		/*
>>>> +		 * At least OP-TEE v4.1.0 seems to not invalidate all dirty cache
>>>> +		 * lines before enabling the MMU. This can lead to spurious hangs
>>>> +		 * on return to barebox on systems where there might be left-over
>>>> +		 * dirty cache lines, whether from BootROM or because L2 cache
>>>> +		 * is non-architected and powers on with unpredictable content
>>>> +		 * like is the case with PL310 on i.MX6Q.
>>>> +		 *
>>>> +		 * Let's invalidate the caches here, so board entry points need
>>>> +		 * not bother.
>>>> +		 */
>>>> +		arm_early_mmu_cache_invalidate();
>>>> +
>>>>  		tee_start(0, 0, fdt);
>>>>  		longjmp(tee_buf, 1);
>>>>  	}
>>>
>>>
>>
> 
> 

-- 
Pengutronix e.K.                  |                             |
Steuerwalder Str. 21              | http://www.pengutronix.de/  |
31137 Hildesheim, Germany         | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686  | Fax:   +49-5121-206917-5555 |




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE
  2025-06-03 14:51         ` Ahmad Fatoum
@ 2025-06-03 15:20           ` Lucas Stach
  2025-06-04  9:57             ` Rouven Czerwinski
  0 siblings, 1 reply; 9+ messages in thread
From: Lucas Stach @ 2025-06-03 15:20 UTC (permalink / raw)
  To: Ahmad Fatoum, Fabian Pflug, barebox; +Cc: rouven.czerwinski

Am Dienstag, dem 03.06.2025 um 16:51 +0200 schrieb Ahmad Fatoum:
[...]
> > > > I guess it would be much better to simply have the
> > > > arm_early_mmu_cache_invalidate() as part of the Cortex A9 lowlevel CPU
> > > > initialization at the very start of the PBL entry.
> > > 
> > > We don't have a dedicated Cortex-A9 lowlevel entry function
> > > unfortunately, just some for specific processors, e.g. the
> > > imx6_cpu_lowlevel_init.
> > > 
> > > We could add CONFIG_CPU_CORTEX_A9, select it from the relevant SoC
> > > options and depending on it, add the invalidation to
> > > arm_cpu_lowlevel_init()? What do you think?
> > > 
> > This would then trigger the invalidation even on systems that don't
> > need it in case of a multiarch Barebox. There aren't that many Cortex
> > A9 based SoCs supported in Barebox and all of them should have a SoC
> > specific init function to apply the necessary workarounds, so I think
> > it would be fine to call the cache invalidate from the SoC specific
> > lowlevel init of those few SoCs?
> 
> Fair enough. How do we know we only need this for Cortex-A9 though?
> Couldn't e.g. the Cortex-A8 also be affected?

We can't be 100% sure without specific knowledge about each SoC
integration. Both the Cortex A8 [1] and Cortex A15 [2] TRMs define a
reset sequence that mandates the straps to be set in such a way that
the processor will clear all L1 and L2 memory arrays on power-on reset.

The only odd one where the TRM doesn't even mention memory arrays in
the reset sequence is the Cortex A9 [3], which pretty much lines up
with the number of SoCs where we have seen issues due to uninitialized
cache content.

Regards,
Lucas

[1] https://developer.arm.com/documentation/ddi0344/k/Cihcbcgi
[2] https://developer.arm.com/documentation/ddi0438/i/functional-description/clocking-and-resets/resets
[3] https://developer.arm.com/documentation/100511/0401/functional-description/clocking-and-resets/reset



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE
  2025-06-03 15:20           ` Lucas Stach
@ 2025-06-04  9:57             ` Rouven Czerwinski
  0 siblings, 0 replies; 9+ messages in thread
From: Rouven Czerwinski @ 2025-06-04  9:57 UTC (permalink / raw)
  To: Lucas Stach; +Cc: Ahmad Fatoum, Fabian Pflug, barebox

Hi,

On Tue, 3 Jun 2025 at 17:20, Lucas Stach <l.stach@pengutronix.de> wrote:
>
> Am Dienstag, dem 03.06.2025 um 16:51 +0200 schrieb Ahmad Fatoum:
> [...]
> > > > > I guess it would be much better to simply have the
> > > > > arm_early_mmu_cache_invalidate() as part of the Cortex A9 lowlevel CPU
> > > > > initialization at the very start of the PBL entry.

To add my few cents:
I have had customer systems where without cache invalidation,
imx-usb-loader would
sporadically hang. This hang was fixed by calling
arm_early_mmu_cache_invalidate()
in the lowlevel pbl code as well. Barebox already has i.MX6 boards
which to call the
invalidation early in PBL, but unfortunately none have a comment or
commit description
why.

> > > >
> > > > We don't have a dedicated Cortex-A9 lowlevel entry function
> > > > unfortunately, just some for specific processors, e.g. the
> > > > imx6_cpu_lowlevel_init.
> > > >
> > > > We could add CONFIG_CPU_CORTEX_A9, select it from the relevant SoC
> > > > options and depending on it, add the invalidation to
> > > > arm_cpu_lowlevel_init()? What do you think?
> > > >
> > > This would then trigger the invalidation even on systems that don't
> > > need it in case of a multiarch Barebox. There aren't that many Cortex
> > > A9 based SoCs supported in Barebox and all of them should have a SoC
> > > specific init function to apply the necessary workarounds, so I think
> > > it would be fine to call the cache invalidate from the SoC specific
> > > lowlevel init of those few SoCs?
> >
> > Fair enough. How do we know we only need this for Cortex-A9 though?
> > Couldn't e.g. the Cortex-A8 also be affected?
>
> We can't be 100% sure without specific knowledge about each SoC
> integration. Both the Cortex A8 [1] and Cortex A15 [2] TRMs define a
> reset sequence that mandates the straps to be set in such a way that
> the processor will clear all L1 and L2 memory arrays on power-on reset.
>
> The only odd one where the TRM doesn't even mention memory arrays in
> the reset sequence is the Cortex A9 [3], which pretty much lines up
> with the number of SoCs where we have seen issues due to uninitialized
> cache content.
>
> Regards,
> Lucas
>
> [1] https://developer.arm.com/documentation/ddi0344/k/Cihcbcgi
> [2] https://developer.arm.com/documentation/ddi0438/i/functional-description/clocking-and-resets/resets
> [3] https://developer.arm.com/documentation/100511/0401/functional-description/clocking-and-resets/reset

Adding this to imx6_cpu_lowlevel_init() is IMO the correct way as well.

- Rouven



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution
  2025-06-03  9:20 [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution Fabian Pflug
  2025-06-03  9:20 ` [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE Fabian Pflug
@ 2025-06-04 10:00 ` Rouven Czerwinski
  1 sibling, 0 replies; 9+ messages in thread
From: Rouven Czerwinski @ 2025-06-04 10:00 UTC (permalink / raw)
  To: Fabian Pflug; +Cc: barebox, lst, Ahmad Fatoum

Hi,

On Tue, 3 Jun 2025 at 11:20, Fabian Pflug <f.pflug@pengutronix.de> wrote:
>
> We expect start_optee_early() to be called with caches disabled, so there
> is no point in calling sync_caches_for_execution inside.
>
> Therefore let's verify the assumption that caches are indeed disabled
> and drop the useless call.
>
> Co-authored-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> Signed-off-by: Fabian Pflug <f.pflug@pengutronix.de>
> ---
>  arch/arm/lib32/optee-early.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/lib32/optee-early.c b/arch/arm/lib32/optee-early.c
> index 735d829c99..0cda0ab163 100644
> --- a/arch/arm/lib32/optee-early.c
> +++ b/arch/arm/lib32/optee-early.c
> @@ -7,6 +7,7 @@
>   */
>  #include <asm/cache.h>
>  #include <asm/setjmp.h>
> +#include <asm/system.h>
>  #include <tee/optee.h>
>  #include <debug_ll.h>
>  #include <string.h>
> @@ -24,13 +25,16 @@ int start_optee_early(void *fdt, void *tee)
>         if (ret < 0)
>                 return ret;
>
> +       /* We expect this function to be called with data caches disabled */
> +       if (get_cr() & CR_C)
> +               return -ENXIO;
> +

I don't think returning an error here is correct, we should probably
just panic()
with a useful error message. None of the current callers of the function check
the return code, so at best the board starts without OP-TEE, at worst
it randomly
hangs because no PSCI provider can be found.

>         memcpy((void *)hdr->init_load_addr_lo, tee + sizeof(*hdr), hdr->init_size);
>         tee_start = (void *) hdr->init_load_addr_lo;
>
>         /* We use setjmp/longjmp here because OP-TEE clobbers most registers */
>         ret = setjmp(tee_buf);
>         if (ret == 0) {
> -               sync_caches_for_execution();
>                 tee_start(0, 0, fdt);
>                 longjmp(tee_buf, 1);
>         }
> --
> 2.39.5
>

- Rouven



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-06-04 10:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-06-03  9:20 [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution Fabian Pflug
2025-06-03  9:20 ` [PATCH 2/2] ARM: optee-early: invalidate caches before jump to OP-TEE Fabian Pflug
2025-06-03  9:57   ` Lucas Stach
2025-06-03 10:18     ` Ahmad Fatoum
2025-06-03 14:47       ` Lucas Stach
2025-06-03 14:51         ` Ahmad Fatoum
2025-06-03 15:20           ` Lucas Stach
2025-06-04  9:57             ` Rouven Czerwinski
2025-06-04 10:00 ` [PATCH 1/2] ARM: optee-early: drop superfluous sync_caches_for_execution Rouven Czerwinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox