mail archive of the barebox mailing list
 help / color / mirror / Atom feed
* LS1021A performance
@ 2023-03-30 11:34 Renaud Barbier
  2023-03-30 11:45 ` Ahmad Fatoum
  0 siblings, 1 reply; 5+ messages in thread
From: Renaud Barbier @ 2023-03-30 11:34 UTC (permalink / raw)
  To: Barebox List

Hello,
I am looking into the performance of the LS1021A between Linux and Barebox and a PPC P1014

I noticed the following on md5sum calculation on 1MB file;

On Barebox, md5sum of a 1MB file in memory:
barebox@LS1021A-IOT Board:/ time md5sum /file
27a45e1d2fc461638aafce09b6397841  /file
time: 494ms

The DDR is cacheable:
barebox:/ mmuinfo 0x80000000
PAR result for 0x80000000:
 privileged read: 0x8000005c
  Physical Address [31:12]: 0x80000000
  Reserved [11]:            0x0
  Not Outer Shareable [10]: 0x0
  Non-Secure [9]:           0x0
  Impl. def. [8]:           0x0
  Shareable [7]:            0x0
  Inner mem. attr. [6:4]:   0x5 (0b101 Write-Back, Write-Allocate)
  Outer mem. attr. [3:2]:   0x3 (0b11 Write-Back, no Write-Allocate)
  SuperSection [1]:         0x0
  Failure [0]:              0x0
 privileged write: 0x8000005c
  Physical Address [31:12]: 0x80000000
  Reserved [11]:            0x0
  Not Outer Shareable [10]: 0x0
  Non-Secure [9]:           0x0
  Impl. def. [8]:           0x0
  Shareable [7]:            0x0
  Inner mem. attr. [6:4]:   0x5 (0b101 Write-Back, Write-Allocate)
  Outer mem. attr. [3:2]:   0x3 (0b11 Write-Back, no Write-Allocate)
  SuperSection [1]:         0x0
  Failure [0]:              0x0


On a Freescale P1014 (PPC)with Barebox:
time md5sum self1
f168af3541bc7109150e6be2f6c8cde4  self1
time: 57ms

On Linux:
[root@openware]# time md5sum /tmp/mtd0
26d8158619e5791859519654557aeeba  /tmp/mtd0

real    0m0.029s
user    0m0.025s
sys     0m0.001s

This is almost a 20 fold difference.

>From my In-Circuit-Emulator, the cache is enabled on both Linux and Barebox. My guess it that it comes down how the MMU is used.
Any input on how to speed up the boot loader would be appreciated.

Cheers,
Renaud





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: LS1021A performance
  2023-03-30 11:34 LS1021A performance Renaud Barbier
@ 2023-03-30 11:45 ` Ahmad Fatoum
  2023-03-30 13:31   ` Renaud Barbier
  0 siblings, 1 reply; 5+ messages in thread
From: Ahmad Fatoum @ 2023-03-30 11:45 UTC (permalink / raw)
  To: Renaud Barbier, Barebox List

Hello Renaud,


On 30.03.23 13:34, Renaud Barbier wrote:
> Hello,
> I am looking into the performance of the LS1021A between Linux and Barebox and a PPC P1014

> On Linux:
> [root@openware]# time md5sum /tmp/mtd0
> 26d8158619e5791859519654557aeeba  /tmp/mtd0
> 
> real    0m0.029s
> user    0m0.025s
> sys     0m0.001s
> 
> This is almost a 20 fold difference.
> 
> From my In-Circuit-Emulator, the cache is enabled on both Linux and Barebox. My guess it that it comes down how the MMU is used.
> Any input on how to speed up the boot loader would be appreciated.

Can you compare SHA256 instead and see if the difference is still as stark?
Make sure that CONFIG_DIGEST_SHA256_ARM is enabled.

Do barebox and Linux run at the same CPU frequency?

Cheers,
Ahmad

> 
> Cheers,
> Renaud
> 
> 
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |




^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: LS1021A performance
  2023-03-30 11:45 ` Ahmad Fatoum
@ 2023-03-30 13:31   ` Renaud Barbier
  2023-03-30 14:17     ` Lucas Stach
  0 siblings, 1 reply; 5+ messages in thread
From: Renaud Barbier @ 2023-03-30 13:31 UTC (permalink / raw)
  To: Ahmad Fatoum, Barebox List



 
> Can you compare SHA256 instead and see if the difference is still as stark?
> Make sure that CONFIG_DIGEST_SHA256_ARM is enabled.
The SHA256 is enabled. SHA256 on a 1 MB file:
Barebox: 843ms
Linux: 
[root@openware]# time sha256sum /tmp/mtd0
eef67a3327e3eaa50ee7b1dad87901465f00d76a6308e360a2fedab82c79f493  /tmp/mtd0

real    0m0.059s
user    0m0.056s
sys     0m0.001s

On another note, the boot loader using the LS1021A is much slower than using the PPC P1014.
I compare those two as we used the LS1021A as a replacement for P1014 on a board (same peripherals, same boot sequence)
The P1014 reach the prompt in 200ms while the LS1021 takes 700ms.

Also, I noticed that the pageflags is different for the DDR memory on Barebox and Linux as seen by the Lauterbach:
Barebox: write-back/no allocate
Linux : Inner:write-back/allocate outer: write-back/allocate
Could that mean the L2 cache Is not used?
> 
> Do barebox and Linux run at the same CPU frequency?
According to the Lauterbach, clock ratio have not changed in the clocking registers
> 
> Cheers,
> Ahmad
> 
> >
> > Cheers,
> > Renaud
> >
> >
> >
> >
> 
> --
> Pengutronix e.K.                           |                             |
> Steuerwalder Str. 21                       |
> https://urldefense.com/v3/__http://www.pengutronix.de/__;!!HKOSU0g!D
> 4uFepgqngTTHamr_7tlQeQoRJqSLL8npxTFBWFF-
> kjpZuHgzi1quS6EE1ecjCKr_O_FJGPfkAnWXQyfONKJxqgrtQQ$   |
> 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: LS1021A performance
  2023-03-30 13:31   ` Renaud Barbier
@ 2023-03-30 14:17     ` Lucas Stach
  2023-03-30 15:17       ` Renaud Barbier
  0 siblings, 1 reply; 5+ messages in thread
From: Lucas Stach @ 2023-03-30 14:17 UTC (permalink / raw)
  To: Renaud Barbier, Ahmad Fatoum, Barebox List

Hi Renaud,

Am Donnerstag, dem 30.03.2023 um 13:31 +0000 schrieb Renaud Barbier:
> 
>  
> > Can you compare SHA256 instead and see if the difference is still as stark?
> > Make sure that CONFIG_DIGEST_SHA256_ARM is enabled.
> The SHA256 is enabled. SHA256 on a 1 MB file:
> Barebox: 843ms
> Linux: 
> [root@openware]# time sha256sum /tmp/mtd0
> eef67a3327e3eaa50ee7b1dad87901465f00d76a6308e360a2fedab82c79f493  /tmp/mtd0
> 
> real    0m0.059s
> user    0m0.056s
> sys     0m0.001s
> 
> On another note, the boot loader using the LS1021A is much slower than using the PPC P1014.
> I compare those two as we used the LS1021A as a replacement for P1014 on a board (same peripherals, same boot sequence)
> The P1014 reach the prompt in 200ms while the LS1021 takes 700ms.
> 
> Also, I noticed that the pageflags is different for the DDR memory on Barebox and Linux as seen by the Lauterbach:
> Barebox: write-back/no allocate
> Linux : Inner:write-back/allocate outer: write-back/allocate
> Could that mean the L2 cache Is not used?
> > 
> > Do barebox and Linux run at the same CPU frequency?
> According to the Lauterbach, clock ratio have not changed in the clocking registers
> > 

As the LS1021A is based on a Cortex A7 your board lowlevel init needs
to call cortex_a7_lowlevel_init() for the caches to work properly.

It's probably a good idea to add a ls1021 lowlevel init function which
calls both of those functions together, like
imx6ul_cpu_lowlevel_init().

Regards,
Lucas

> > Cheers,
> > Ahmad
> > 
> > > 
> > > Cheers,
> > > Renaud
> > > 
> > > 
> > > 
> > > 
> > 
> > --
> > Pengutronix e.K.                           |                             |
> > Steuerwalder Str. 21                       |
> > https://urldefense.com/v3/__http://www.pengutronix.de/__;!!HKOSU0g!D
> > 4uFepgqngTTHamr_7tlQeQoRJqSLL8npxTFBWFF-
> > kjpZuHgzi1quS6EE1ecjCKr_O_FJGPfkAnWXQyfONKJxqgrtQQ$   |
> > 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
> > Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
> 




^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: LS1021A performance
  2023-03-30 14:17     ` Lucas Stach
@ 2023-03-30 15:17       ` Renaud Barbier
  0 siblings, 0 replies; 5+ messages in thread
From: Renaud Barbier @ 2023-03-30 15:17 UTC (permalink / raw)
  To: Lucas Stach, Ahmad Fatoum, Barebox List


> > > Can you compare SHA256 instead and see if the difference is still as stark?
> > > Make sure that CONFIG_DIGEST_SHA256_ARM is enabled.
> > The SHA256 is enabled. SHA256 on a 1 MB file:
> > Barebox: 843ms
> > Linux:
> > [root@openware]# time sha256sum /tmp/mtd0
> > eef67a3327e3eaa50ee7b1dad87901465f00d76a6308e360a2fedab82c79f493
> > /tmp/mtd0
> >
> > real    0m0.059s
> > user    0m0.056s
> > sys     0m0.001s
> >
> > On another note, the boot loader using the LS1021A is much slower than
> using the PPC P1014.
> > I compare those two as we used the LS1021A as a replacement for P1014
> > on a board (same peripherals, same boot sequence) The P1014 reach the
> prompt in 200ms while the LS1021 takes 700ms.
> >
> > Also, I noticed that the pageflags is different for the DDR memory on
> Barebox and Linux as seen by the Lauterbach:
> > Barebox: write-back/no allocate
> > Linux : Inner:write-back/allocate outer: write-back/allocate Could
> > that mean the L2 cache Is not used?
> > >
> > > Do barebox and Linux run at the same CPU frequency?
> > According to the Lauterbach, clock ratio have not changed in the
> > clocking registers
> > >
> 
> As the LS1021A is based on a Cortex A7 your board lowlevel init needs to call
> cortex_a7_lowlevel_init() for the caches to work properly.
> 
> It's probably a good idea to add a ls1021 lowlevel init function which calls both
> of those functions together, like imx6ul_cpu_lowlevel_init().
> 

With this function call, the boot loader boots 3 times faster and sha256sum for 1MB dropped to 118ms from 843ms
This now breaks the gianfar Ethernet driver which likely needs cache flush or dma allocated descriptors.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-30 15:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-30 11:34 LS1021A performance Renaud Barbier
2023-03-30 11:45 ` Ahmad Fatoum
2023-03-30 13:31   ` Renaud Barbier
2023-03-30 14:17     ` Lucas Stach
2023-03-30 15:17       ` Renaud Barbier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox