mail archive of the barebox mailing list
 help / color / mirror / Atom feed
* Reset on Beaglebone Black has become unreliable/broken
@ 2024-11-28  9:07 Konstantin Kletschke
  2024-11-28  9:23 ` Ahmad Fatoum
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Kletschke @ 2024-11-28  9:07 UTC (permalink / raw)
  To: barebox

Dear barebox community and hackers,

we use barebox 022.04.0-dirty from 
https://github.com/menschel-d/meta-barebox.git in our yocto kirkstone project.
This worked for ages in up to hundreds of BBBs without any issue.

Since last week I have the problem, that the system is not able to
reboot (linux userspace issuing reboot command) or reset (command reset
at barebox prompt) anymore in _some_ of the BBBs we got delivered from
SEEED (we get a couple of hundreds a couple of times per year). Speaking
of some one digit percentage.

Linux userspace running, issuing reboot command:

systemd-shutdown[1]: Rebooting.
reboot: Restarting system
-> Then gets stuck

Barebox prompt, issuing reset command:

Hit m for menu or ctrl-c to stop autoboot:    3
barebox@TI AM335x BeagleBone black:/ reset
-> Then gets stuck

This also applies to triggering the barebox's watchdog to trigger reset
and also the hardware line on the BBB S2 is not working on those BBBs
too! The S2 button is connected to CPU's NRESET_INOUT ball A10.

If I test those use cases with stock u-boot delivered with the BBB the
reset/reboot works each time.

>From the symptoms I guess the barebox is not able to start in each case
when it should.
Where can I start to investigate such an error, what could cause the
hardware glitching away that something is on the edge which does not
work anymore?

I learned it is something like a soft reset which is done in software,
where can I look in the sourcetree for this special part?

Kind Regards
Konstantin 
Kletschke
-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-11-28  9:07 Reset on Beaglebone Black has become unreliable/broken Konstantin Kletschke
@ 2024-11-28  9:23 ` Ahmad Fatoum
  2024-11-28  9:46   ` Konstantin Kletschke
  0 siblings, 1 reply; 22+ messages in thread
From: Ahmad Fatoum @ 2024-11-28  9:23 UTC (permalink / raw)
  To: Konstantin Kletschke, barebox

Hello Konstantin,

On 28.11.24 10:07, Konstantin Kletschke wrote:
> Dear barebox community and hackers,
> 
> we use barebox 022.04.0-dirty from 

I assume this should be v2022.04? -dirty means you have local patches
on top. Do any of them touch SoC-specific, board-specific parts
like clock or power?

> https://github.com/menschel-d/meta-barebox.git in our yocto kirkstone project.
> This worked for ages in up to hundreds of BBBs without any issue.
> 
> Since last week I have the problem, that the system is not able to
> reboot (linux userspace issuing reboot command) or reset (command reset
> at barebox prompt) anymore in _some_ of the BBBs we got delivered from
> SEEED (we get a couple of hundreds a couple of times per year). Speaking
> of some one digit percentage.

What changed over the last week on the software side? I understand barebox
stayed the same? Is the kernel still the same?

> Linux userspace running, issuing reboot command:
> 
> systemd-shutdown[1]: Rebooting.
> reboot: Restarting system
> -> Then gets stuck

On affected hardware: Does this happen always or only some times?

> Barebox prompt, issuing reset command:
> 
> Hit m for menu or ctrl-c to stop autoboot:    3
> barebox@TI AM335x BeagleBone black:/ reset
> -> Then gets stuck
> 
> This also applies to triggering the barebox's watchdog to trigger reset
> and also the hardware line on the BBB S2 is not working on those BBBs
> too! The S2 button is connected to CPU's NRESET_INOUT ball A10.

This sounds very similar to the issue fixed in commit 9c1a78f959dd
("Revert "ARM: beaglebone: init MPU speed to 800Mhz""), but that's already
included in v2022.04.0, hence the question if you have patches that
do anything similar.

> If I test those use cases with stock u-boot delivered with the BBB the
> reset/reboot works each time.
> 
> From the symptoms I guess the barebox is not able to start in each case
> when it should.

Yes, but it sounds strange that only now these problems pop up?

> Where can I start to investigate such an error, what could cause the
> hardware glitching away that something is on the edge which does not
> work anymore?

Besides checking what changed, you should check if Linux is playing
around with the voltages powering the SoC and if it does, disable that
to see if it improves the situation.

Afterwards, we can look into how you can make barebox resilient against
this.

> I learned it is something like a soft reset which is done in software,
> where can I look in the sourcetree for this special part?

Your barebox restart handler is probably am33xx_restart_soc (named
"soc" in reset -l output).

Cheers,
Ahmad

> 
> Kind Regards
> Konstantin 
> Kletschke


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-11-28  9:23 ` Ahmad Fatoum
@ 2024-11-28  9:46   ` Konstantin Kletschke
  2024-11-28 11:18     ` Ahmad Fatoum
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Kletschke @ 2024-11-28  9:46 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox

On Thu, Nov 28, 2024 at 10:23:10AM +0100, Ahmad Fatoum wrote:

> I assume this should be v2022.04? -dirty means you have local patches
> on top. Do any of them touch SoC-specific, board-specific parts
> like clock or power?

Yes, it is "barebox 2022.04.0-dirty #1 Tue Sep 10 08:45:54 UTC 2024".
The patches we apply do not touch any clock or power, we touch:
Environment, kernel cmdline, watchdog settings, bootchooser config, 
autoabortkey. Config stuff.

> What changed over the last week on the software side? I understand barebox
> stayed the same? Is the kernel still the same?

We changed nothing. I use to ship this barebox version with kernel for a
couple of months. Last week we only ramped up quantity but the fails are
so high in percentage it should had happened a couple of times before.

> On affected hardware: Does this happen always or only some times?

Always. Easy reproducable.
Meanwhile I realized on affected BBBs it can be reproduced this way:

Boot, hit Ctrl-C to stop barebox at prompt.
Hit S1 button which is wired to NRESET_INOUT ball A10 (its not S2 as I
initially wrote, S1).
System is stuck/frozen/dead.

> This sounds very similar to the issue fixed in commit 9c1a78f959dd
> ("Revert "ARM: beaglebone: init MPU speed to 800Mhz""), but that's already
> included in v2022.04.0, hence the question if you have patches that
> do anything similar.

Sounds interesting, I will take a look. As said, we patch no clock
voltages or something like that.

> Yes, but it sounds strange that only now these problems pop up?

Yes. Last week we started to experience this problem in production, we
have ~200 working BBBs, ~20 have this problem. The batch worked
flawlessly but suddenly a couple of broken BBBs kinda heaped one day,
now sometimes this happens.

I am even not so shure if software is to blame or if the hardware is or
has become glitchy, but falsinh stock u-boot still is able to
reset/restart on its own on these devices.

> Besides checking what changed, you should check if Linux is playing
> around with the voltages powering the SoC and if it does, disable that
> to see if it improves the situation.

Sadly (or gladly?) linux is not involved on affected BBBs. Boot, stop in
bootloader, hit S1, system freezes.

> Your barebox restart handler is probably am33xx_restart_soc (named
> "soc" in reset -l output).

I will poke around, never in my life was dealing with reset code :-)

Regards
Konsti


-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-11-28  9:46   ` Konstantin Kletschke
@ 2024-11-28 11:18     ` Ahmad Fatoum
  2024-11-28 12:02       ` Konstantin Kletschke
  0 siblings, 1 reply; 22+ messages in thread
From: Ahmad Fatoum @ 2024-11-28 11:18 UTC (permalink / raw)
  To: Konstantin Kletschke; +Cc: barebox

Hi,

On 28.11.24 10:46, Konstantin Kletschke wrote:
> On Thu, Nov 28, 2024 at 10:23:10AM +0100, Ahmad Fatoum wrote:
> 
>> I assume this should be v2022.04? -dirty means you have local patches
>> on top. Do any of them touch SoC-specific, board-specific parts
>> like clock or power?
> 
> Yes, it is "barebox 2022.04.0-dirty #1 Tue Sep 10 08:45:54 UTC 2024".
> The patches we apply do not touch any clock or power, we touch:
> Environment, kernel cmdline, watchdog settings, bootchooser config, 
> autoabortkey. Config stuff.
> 
>> What changed over the last week on the software side? I understand barebox
>> stayed the same? Is the kernel still the same?
> 
> We changed nothing. I use to ship this barebox version with kernel for a
> couple of months. Last week we only ramped up quantity but the fails are
> so high in percentage it should had happened a couple of times before.

Are you still building with the same toolchain?

>> On affected hardware: Does this happen always or only some times?
> 
> Always. Easy reproducable.
> Meanwhile I realized on affected BBBs it can be reproduced this way:
> 
> Boot, hit Ctrl-C to stop barebox at prompt.
> Hit S1 button which is wired to NRESET_INOUT ball A10 (its not S2 as I
> initially wrote, S1).
> System is stuck/frozen/dead.

So repeating these steps on some boards never shows any issues and on
some others it always shows issues?

>> This sounds very similar to the issue fixed in commit 9c1a78f959dd
>> ("Revert "ARM: beaglebone: init MPU speed to 800Mhz""), but that's already
>> included in v2022.04.0, hence the question if you have patches that
>> do anything similar.
> 
> Sounds interesting, I will take a look. As said, we patch no clock
> voltages or something like that.

Ok.

>> Yes, but it sounds strange that only now these problems pop up?
> 
> Yes. Last week we started to experience this problem in production, we
> have ~200 working BBBs, ~20 have this problem. The batch worked
> flawlessly but suddenly a couple of broken BBBs kinda heaped one day,
> now sometimes this happens.
> 
> I am even not so shure if software is to blame or if the hardware is or
> has become glitchy, but falsinh stock u-boot still is able to
> reset/restart on its own on these devices.

My guess would be an incompatibility between the settings in the PMIC
and what barebox configures. barebox doesn't touch the PMIC and tries
to use clock rates that should be safe regardless of what changes Linux
did to the PMIC.

U-Boot, depending on version, may be reprogramming the PMIC to allow
for higher clock rates that barebox doesn't currently go for and this
might be related to the issues you are seeing.

>> Besides checking what changed, you should check if Linux is playing
>> around with the voltages powering the SoC and if it does, disable that
>> to see if it improves the situation.
> 
> Sadly (or gladly?) linux is not involved on affected BBBs. Boot, stop in
> bootloader, hit S1, system freezes.

So this happens even after a completely cold reset?

>> Your barebox restart handler is probably am33xx_restart_soc (named
>> "soc" in reset -l output).
> 
> I will poke around, never in my life was dealing with reset code :-)

I'd suggest you enable CONFIG_DEBUG_LL and look if you see at least a >
character on the serial console output by the MLO.

If you don't see it, try moving these lines:

  am33xx_uart_soft_reset((void *)AM33XX_UART0_BASE);
  am33xx_enable_uart0_pin_mux();
  omap_debug_ll_init();
  putc_ll('>');

to the start of beaglebone_sram_init() and see if you get the > printed.

The point is making sure that barebox itself starts up before seeing where
it's getting stuck.

Cheers,
Ahmad

> 
> Regards
> Konsti
> 
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-11-28 11:18     ` Ahmad Fatoum
@ 2024-11-28 12:02       ` Konstantin Kletschke
  2024-11-28 15:25         ` Konstantin Kletschke
  2024-12-02 12:41         ` Ahmad Fatoum
  0 siblings, 2 replies; 22+ messages in thread
From: Konstantin Kletschke @ 2024-11-28 12:02 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox

On Thu, Nov 28, 2024 at 12:18:45PM +0100, Ahmad Fatoum wrote:
> 
> Are you still building with the same toolchain?

Yes, I am always using a yocto kirkstone with its toolchain:
    - git clone -b kirkstone git://git.yoctoproject.org/poky.git
    - git -C poky checkout tags/yocto-4.0.13

    and

    - git clone -b kirkstone https://github.com/menschel-d/meta-barebox.git
    - mv meta-barebox poky/meta-barebox

> So repeating these steps on some boards never shows any issues and on
> some others it always shows issues?

Yes

> So this happens even after a completely cold reset?

Yes: Power on, hit S1 or type reset in stopped barebox -> freeze

> I'd suggest you enable CONFIG_DEBUG_LL and look if you see at least a >
> character on the serial console output by the MLO.
> 
> If you don't see it, try moving these lines:
> 
>   am33xx_uart_soft_reset((void *)AM33XX_UART0_BASE);
>   am33xx_enable_uart0_pin_mux();
>   omap_debug_ll_init();
>   putc_ll('>');
> 
> to the start of beaglebone_sram_init() and see if you get the > printed.
> 
> The point is making sure that barebox itself starts up before seeing where
> it's getting stuck.

I will try that immediately.

I reproduced the same behaviour with a non resetting BBB device with
different software setup:

Checked out current barebox git: barebox 2024.10.0-00150-g7a3cb7e6fd63 #2 Thu Nov 28 12:37:15 CET 2024
Changed CONFIG_BAREBOX_MAX_IMAGE_SIZE from 0x1b400 to 0x2b400
Did am335x_mlo_defconfig and omap_defconfig and copied those images.
I used another crosscompiler toolchain for this: gcc-linaro-7.5.0-2019.12-x86_64_arm-linux-gnueabihf

Same error behaviour: Power up -> S1 or "reset" produce freeze.

Will test CONFIG_DEBUG_LL and/or move those lines around.

Regards
Konsti


-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-11-28 12:02       ` Konstantin Kletschke
@ 2024-11-28 15:25         ` Konstantin Kletschke
  2024-12-02 12:41         ` Ahmad Fatoum
  1 sibling, 0 replies; 22+ messages in thread
From: Konstantin Kletschke @ 2024-11-28 15:25 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox

On Thu, Nov 28, 2024 at 01:02:01PM +0100, Konstantin Kletschke wrote:
> 
> > I'd suggest you enable CONFIG_DEBUG_LL and look if you see at least a >
> > character on the serial console output by the MLO.
> > 
> > If you don't see it, try moving these lines:
> > 
> >   am33xx_uart_soft_reset((void *)AM33XX_UART0_BASE);
> >   am33xx_enable_uart0_pin_mux();
> >   omap_debug_ll_init();
> >   putc_ll('>');
> > 
> > to the start of beaglebone_sram_init() and see if you get the > printed.
> > 
> > The point is making sure that barebox itself starts up before seeing where
> > it's getting stuck.
> 
> I will try that immediately.


I tried that.

make am335x_mlo_defconfig

Than I set:

CONFIG_HAS_DEBUG_LL=y
CONFIG_DEBUG_LL=y
CONFIG_DEBUG_OMAP_UART=y
CONFIG_DEBUG_AM33XX_UART=y
CONFIG_DEBUG_OMAP_UART_PORT=0

Then I removed the MTD driver, the old hack to set CONFIG_BAREBOX_MAX_IMAGE_SIZE 
to 0x2b400 somehow did not work, the image(s) did not boot at all. So
removing MTD allowed me to keep old 0x1b400 as SIZE and booted.

The I did 

make omap_defconfig 

and set

CONFIG_HAS_DEBUG_LL=y
CONFIG_DEBUG_LL=y
CONFIG_DEBUG_OMAP_UART=y
# CONFIG_DEBUG_OMAP3_UART is not set
CONFIG_DEBUG_AM33XX_UART=y
CONFIG_DEBUG_OMAP_UART_PORT=0

copied both images and booted, wich works.
At the start I see a glitch on the serial console like this:

~�W-�,-H]
                           ���k׋�ҫ�.LWC�C�C��arebox 2024.10.0-00150-g7a3cb7e6fd63-dirty #1 Thu Nov 28 14:35:15 CET 2024


Other baud rate?

However, reset via "reset" or S1 does not reveal a ">" or something. So
I moved the 4 lines in arch/arm/boards/beaglebone/lowlevel.c directly
below "void *fdt;", sadly this seems to not boot at all then, I see no
output on console anymore and the one blinking LED when barebox is idle
does not blink.

Regards
Konsti


-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-11-28 12:02       ` Konstantin Kletschke
  2024-11-28 15:25         ` Konstantin Kletschke
@ 2024-12-02 12:41         ` Ahmad Fatoum
  2024-12-02 14:15           ` Konstantin Kletschke
  1 sibling, 1 reply; 22+ messages in thread
From: Ahmad Fatoum @ 2024-12-02 12:41 UTC (permalink / raw)
  To: Konstantin Kletschke; +Cc: barebox

Hello Konstantin,

On 28.11.24 13:02, Konstantin Kletschke wrote:
> On Thu, Nov 28, 2024 at 12:18:45PM +0100, Ahmad Fatoum wrote:
>>
>> Are you still building with the same toolchain?
> Checked out current barebox git: barebox 2024.10.0-00150-g7a3cb7e6fd63 #2 Thu Nov 28 12:37:15 CET 2024
> Changed CONFIG_BAREBOX_MAX_IMAGE_SIZE from 0x1b400 to 0x2b400

Why do you do this step? 0x1b400 == 109KiB, which is chosen, because the MLO
needs to fit into the On-Chip SRAM of the AM335x.

Increasing the size won't magically increase RAM size, but it may result
in a truncated MLO being loaded into memory or worse: barebox overwriting
memory and MMIO that it shouldn't be touching.

Do you also change this when compiling your normal image?

Cheers,
Ahmad

> Did am335x_mlo_defconfig and omap_defconfig and copied those images.
> I used another crosscompiler toolchain for this: gcc-linaro-7.5.0-2019.12-x86_64_arm-linux-gnueabihf
> 
> Same error behaviour: Power up -> S1 or "reset" produce freeze.
> 
> Will test CONFIG_DEBUG_LL and/or move those lines around.
> 
> Regards
> Konsti
> 
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-02 12:41         ` Ahmad Fatoum
@ 2024-12-02 14:15           ` Konstantin Kletschke
  2024-12-03 18:28             ` Ahmad Fatoum
  2024-12-03 18:34             ` Konstantin Kletschke
  0 siblings, 2 replies; 22+ messages in thread
From: Konstantin Kletschke @ 2024-12-02 14:15 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox

On Mon, Dec 02, 2024 at 01:41:26PM +0100, Ahmad Fatoum wrote:
> Hello Konstantin,

Hello Ahmad,

> > Changed CONFIG_BAREBOX_MAX_IMAGE_SIZE from 0x1b400 to 0x2b400
> 
> Why do you do this step? 0x1b400 == 109KiB, which is chosen, because the MLO
> needs to fit into the On-Chip SRAM of the AM335x.

Well, the current master of barebox does not fit actually, if I do "make
am335x_mlo_defconfig" and "make" I end up with Error

images/start_am33xx_afi_gf_sram.pblb size 112288 > maximum size 111616

I read some information about too much features and if I read correctly
there was today traffic on this mailing lit about dealing with this
problem. I read in the internet somewhere about some guy increasing this
size and going well...

> Increasing the size won't magically increase RAM size, but it may result
> in a truncated MLO being loaded into memory or worse: barebox overwriting
> memory and MMIO that it shouldn't be touching.

... which was not a good idea for me adapting this, I was not aware this
is physically tied to internal SRAM. Oops.

> Do you also change this when compiling your normal image?

No way! Only changes in environment in a production version 2022.04.0.
To deal with debugging and code changes and faster deployment I have
additionally downloaded current master git. In the latter one I do "make
am335x_mlo_defconfig" but after that I enter "make menuconfig" and
disable "CONFIG_MTD" resulting in a successful compile.

Both versions, the production 2022.04.0 and the current master
(defconfig, disabling MTD) behave
the same: On most BBB devices resetting at barebox prompt via "reset" or
pressing S1 works reliable any time, on some (one digit percentage)
never.

I thought about the PMIC handling. I see barebox does not do anything
with it, relies on reset default and sets CPU speed to 500MHz in
lowlevel.c:

am33xx_pll_init(MPUPLL_M_500, DDRPLL_M_400);

Which is totally reasonable, the voltage is 1,1V then in this mode after
powerup.

u-boot seems (I am not 100% sure) to set 1,3V but goes 1000MHz, which is
reasonable too. So there is a difference but not a fatal one.

May I kindly aask how to properly enable the LL debugging?

I do "make am335x_mlo_defconfig", disable CONFIG_MTD and set 

CONFIG_DEBUG_LL=y
CONFIG_DEBUG_OMAP_UART=y
CONFIG_DEBUG_AM33XX_UART=y
CONFIG_DEBUG_OMAP_UART_PORT=0

and compile a new MLO and copy it over. Does the other part barebox.bin
need to be handled the same?
The normal serial console I use is attached to UART0, is it save to use
UART_PORT 0 here also?

This starts well on cold boot, I get some additional non readable chars
at startup, like this:

2~�W-�,-H]
          ���k׋�ҫ�.LWC�C�C��arebox 2024.10.0-00152-g53c99b9f550b-dirty #15 Mon Dec 2 15:07:37 CET 2024

On "reset" or S1 I get no output at all, like without LL debug.

I will try to move the 4 lines in lowlevel.c like you suggested...

Regards
Konstantin


-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-02 14:15           ` Konstantin Kletschke
@ 2024-12-03 18:28             ` Ahmad Fatoum
  2024-12-03 18:51               ` Konstantin Kletschke
  2024-12-03 18:34             ` Konstantin Kletschke
  1 sibling, 1 reply; 22+ messages in thread
From: Ahmad Fatoum @ 2024-12-03 18:28 UTC (permalink / raw)
  To: Konstantin Kletschke; +Cc: barebox

On 02.12.24 15:15, Konstantin Kletschke wrote:
> On Mon, Dec 02, 2024 at 01:41:26PM +0100, Ahmad Fatoum wrote:
> I thought about the PMIC handling. I see barebox does not do anything
> with it, relies on reset default and sets CPU speed to 500MHz in
> lowlevel.c:
> 
> am33xx_pll_init(MPUPLL_M_500, DDRPLL_M_400);
> 
> Which is totally reasonable, the voltage is 1,1V then in this mode after
> powerup.
> 
> u-boot seems (I am not 100% sure) to set 1,3V but goes 1000MHz, which is
> reasonable too. So there is a difference but not a fatal one.
> 
> May I kindly aask how to properly enable the LL debugging?
> 
> I do "make am335x_mlo_defconfig", disable CONFIG_MTD and set 
> 
> CONFIG_DEBUG_LL=y
> CONFIG_DEBUG_OMAP_UART=y
> CONFIG_DEBUG_AM33XX_UART=y
> CONFIG_DEBUG_OMAP_UART_PORT=0
> 
> and compile a new MLO and copy it over. Does the other part barebox.bin
> need to be handled the same?

You can enable it for barebox.bin too, but enabling it for MLO only
should work too.

> The normal serial console I use is attached to UART0, is it save to use
> UART_PORT 0 here also?

Yes, that would be my expectation.

> This starts well on cold boot, I get some additional non readable chars
> at startup, like this:
> 
> 2~�W-�,-H]
>           ���k׋�ҫ�.LWC�C�C��arebox 2024.10.0-00152-g53c99b9f550b-dirty #15 Mon Dec 2 15:07:37 CET 2024

Hmm, CONFIG_BAUDRATE is set correctly?

> On "reset" or S1 I get no output at all, like without LL debug.
> 
> I will try to move the 4 lines in lowlevel.c like you suggested...

If it doesn't work at a later place, it won't work when you move it
earlier. I have a BBB here, so I can give enabling DEBUG_LL a try
too if you get stuck.

Cheers,
Ahmad

> 
> Regards
> Konstantin
> 
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-02 14:15           ` Konstantin Kletschke
  2024-12-03 18:28             ` Ahmad Fatoum
@ 2024-12-03 18:34             ` Konstantin Kletschke
  2024-12-03 18:46               ` Ahmad Fatoum
  1 sibling, 1 reply; 22+ messages in thread
From: Konstantin Kletschke @ 2024-12-03 18:34 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox


Today tried the following for debugging purposes:

In am33xx_generic.c in arch/arm/mach-omap there is in
am33xx_restart_soc() this:

writel(AM33XX_PRM_RSTCTRL_RESET, AM33XX_PRM_RSTCTRL);

which is 0x1 written to 0x44e00f00 and causes warm restart.

I can simulate this with "mw 0x44e00f00 0x1" which shows the freeze 
I see (upon restart) on affected BBBs.

When I change the value to 0x2 (cold restart) the affected BBBs restart
successfully!

Does this ring a bell for people more experiened? This is not meant as a
proposed solution (Watchdog restart, Linux Kernel restart no covered,
reset cause deleted/hidden(?)), more is it meant as an idea to find the
cause. Poweron, cold restart working always100%, warm restart never.

Regards
Konstantin

-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-03 18:34             ` Konstantin Kletschke
@ 2024-12-03 18:46               ` Ahmad Fatoum
  2024-12-03 19:03                 ` Konstantin Kletschke
  2024-12-04 11:07                 ` Konstantin Kletschke
  0 siblings, 2 replies; 22+ messages in thread
From: Ahmad Fatoum @ 2024-12-03 18:46 UTC (permalink / raw)
  To: Konstantin Kletschke; +Cc: barebox

Hello Konstantin,

On 03.12.24 19:34, Konstantin Kletschke wrote:
> 
> Today tried the following for debugging purposes:
> 
> In am33xx_generic.c in arch/arm/mach-omap there is in
> am33xx_restart_soc() this:
> 
> writel(AM33XX_PRM_RSTCTRL_RESET, AM33XX_PRM_RSTCTRL);
> 
> which is 0x1 written to 0x44e00f00 and causes warm restart.
> 
> I can simulate this with "mw 0x44e00f00 0x1" which shows the freeze 
> I see (upon restart) on affected BBBs.

This happens without Linux first starting, right? So that invalidates
my theory of Linux reconfiguring the PMIC to something invalid.

> 
> When I change the value to 0x2 (cold restart) the affected BBBs restart
> successfully!

Nice. Do you know about https://barebox.org/doc/latest/user/system-reset.html ?

TL;DR: Cold reset is usually the preferred way to reset as it comes
with the least amount of surprises.

> Does this ring a bell for people more experiened? This is not meant as a
> proposed solution (Watchdog restart, Linux Kernel restart no covered,
> reset cause deleted/hidden(?)), more is it meant as an idea to find the
> cause. Poweron, cold restart working always100%, warm restart never.

What does a cold reset do on an electrical level? Does it tell the PMIC
to do a reset?

Anther thing, I wonder about is what configuration the PMIC has on affected
boards and boards not affected. Can you use the I2C commands in barebox
to read the PMIC register set and compare it between the affected and
unaffected boards? Maybe they have different mask defaults?

Cheers,
Ahmad

> 
> Regards
> Konstantin
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-03 18:28             ` Ahmad Fatoum
@ 2024-12-03 18:51               ` Konstantin Kletschke
  2024-12-03 20:28                 ` Ahmad Fatoum
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Kletschke @ 2024-12-03 18:51 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox

On Tue, Dec 03, 2024 at 07:28:24PM +0100, Ahmad Fatoum wrote:

> You can enable it for barebox.bin too, but enabling it for MLO only
> should work too.

Okay.

> > 2~�W-�,-H]
> >           ���k׋�ҫ�.LWC�C�C��arebox 2024.10.0-00152-g53c99b9f550b-dirty #15 Mon Dec 2 15:07:37 CET 2024
> 
> Hmm, CONFIG_BAUDRATE is set correctly?

CONFIG_BAUDRATE=115200, never changed that.

> If it doesn't work at a later place, it won't work when you move it
> earlier. I have a BBB here, so I can give enabling DEBUG_LL a try

Ou, then I have misunderstood one of your intial mails where I thought
moving it earlier should change something, if RAM setup fails getting an
output eventually despite of that.

> too if you get stuck.

I manage to get DEBUG_LL on and run it, but the only difference is the
scrambled output in my terminal on powerup boot or when resetting with 
"my cold restart" modification. 
But wait, I do not know if it as glitch, when CONFIG_LL_DEBUG is on and
I power on, I see a "2" immediately put out in terminal prepended first.
When the system gets stuck it does it in a way I do not see anything with
or without enabled in stuck case.

I extremely appreciate your help getting CONFIG_LL_DEBUG run.
I can support with a dozen of BBBs within a day to your office, its
nearly on my way from my home to my work :-D

Regards
Konsti

-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-03 18:46               ` Ahmad Fatoum
@ 2024-12-03 19:03                 ` Konstantin Kletschke
  2024-12-04 11:07                 ` Konstantin Kletschke
  1 sibling, 0 replies; 22+ messages in thread
From: Konstantin Kletschke @ 2024-12-03 19:03 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox

On Tue, Dec 03, 2024 at 07:46:56PM +0100, Ahmad Fatoum wrote:

> > I can simulate this with "mw 0x44e00f00 0x1" which shows the freeze 
> > I see (upon restart) on affected BBBs.
> 
> This happens without Linux first starting, right? So that invalidates
> my theory of Linux reconfiguring the PMIC to something invalid.

Yesthis is all reproducible without linux being involved.
Applies to S1 connected to NRESET_INOUT (warm restart) and barebox too
(reste cmd, watchdog triggering).

> Nice. Do you know about https://barebox.org/doc/latest/user/system-reset.html ?
> 
> TL;DR: Cold reset is usually the preferred way to reset as it comes
> with the least amount of surprises.

No, not yet. I will investigate.
One could change this for BBB, but the ugly part is, the Hardware on
the BBB forreset, S1, triggers warm restart. Hardwired, if I read
correct.

> What does a cold reset do on an electrical level? Does it tell the PMIC
> to do a reset?

I am not shure, I will investigate.
There is a circle CPU PMIC_PWR_EN connected to PMIC and PMIC WAKEUP
connected to CPU. PMIC's reset input is not connected.

> Anther thing, I wonder about is what configuration the PMIC has on affected
> boards and boards not affected. Can you use the I2C commands in barebox
> to read the PMIC register set and compare it between the affected and
> unaffected boards? Maybe they have different mask defaults?

This is a good idea, I will do this tomorrow. Have to get used to read
that out of the PMIC. May be a difference to u-boot is from interest in
a second step too, but the idea about differen mask defaults is
interesting since the error is so darn hitting 100% on affected boards
and 0% on not affected ones.
Could it be different RAM chips, whichsettings causes them to be on the edge?

Regards
Konsti


-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-03 18:51               ` Konstantin Kletschke
@ 2024-12-03 20:28                 ` Ahmad Fatoum
  2024-12-03 21:45                   ` Konstantin Kletschke
  0 siblings, 1 reply; 22+ messages in thread
From: Ahmad Fatoum @ 2024-12-03 20:28 UTC (permalink / raw)
  To: Konstantin Kletschke; +Cc: barebox

On 03.12.24 19:51, Konstantin Kletschke wrote:
> On Tue, Dec 03, 2024 at 07:28:24PM +0100, Ahmad Fatoum wrote:
> 
>> You can enable it for barebox.bin too, but enabling it for MLO only
>> should work too.
> 
> Okay.
> 
>>> 2~�W-�,-H]
>>>           ���k׋�ҫ�.LWC�C�C��arebox 2024.10.0-00152-g53c99b9f550b-dirty #15 Mon Dec 2 15:07:37 CET 2024
>>
>> Hmm, CONFIG_BAUDRATE is set correctly?
> 
> CONFIG_BAUDRATE=115200, never changed that.
> 
>> If it doesn't work at a later place, it won't work when you move it
>> earlier. I have a BBB here, so I can give enabling DEBUG_LL a try
> 
> Ou, then I have misunderstood one of your intial mails where I thought
> moving it earlier should change something, if RAM setup fails getting an
> output eventually despite of that.

- Try it first on a correctly working system an see that you get a >
  at the usual place.

- Try it on a broken system and see if you get a >

- If you don't, try on correct working system if you see > if you
  move it earlier

- Then try on broken system

>> too if you get stuck.
> 
> I manage to get DEBUG_LL on and run it, but the only difference is the
> scrambled output in my terminal on powerup boot or when resetting with 
> "my cold restart" modification. 
> But wait, I do not know if it as glitch, when CONFIG_LL_DEBUG is on and
> I power on, I see a "2" immediately put out in terminal prepended first.
> When the system gets stuck it does it in a way I do not see anything with
> or without enabled in stuck case.

See the patch I just Cc'd you on.
 
> I extremely appreciate your help getting CONFIG_LL_DEBUG run.> I can support with a dozen of BBBs within a day to your office, its
> nearly on my way from my home to my work :-D

I was particularly interested in DEBUG_LL, because that was a regression
as I know that it used to work.

I wish you best luck with debugging the actual issue you have now. :-)

Let me know how it goes,
Ahmad

> 
> Regards
> Konsti
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-03 20:28                 ` Ahmad Fatoum
@ 2024-12-03 21:45                   ` Konstantin Kletschke
  2024-12-04  6:14                     ` Ahmad Fatoum
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Kletschke @ 2024-12-03 21:45 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox

On Tue, Dec 03, 2024 at 09:28:39PM +0100, Ahmad Fatoum wrote:

> - Try it first on a correctly working system an see that you get a >
>   at the usual place.

Working system and comparison of its registers of PMIC with broken one
tomorrow. If still required.

> - Try it on a broken system and see if you get a >

YES. I get this at the very first start with a '2' and console note
before 2 empty newlines and then
the "barebox 2024.10.0-00152-g53c99b9f550b-dirty #33 Tue Dec 3 22:01:26
CET 2024":


2>Switch to console [cs0]


barebox 2024.10.0[...]


> See the patch I just Cc'd you on.

Worked like a charm!

With vanilla code (one ">" in lowlevel.c) I get
the LL output, the warm reset behaviour (stuck at S1, watchdog, reset
cmd, linux reboot) is the same. As expected since its only debugging
dealt with here.

Being the playful idiot child I am I was so happy this now working I wanted to
see something more and added this in lowlevel.c:

@@ -135,6 +140,9 @@ static noinline int beaglebone_sram_init(void)
        am33xx_enable_uart0_pin_mux();
        omap_debug_ll_init();
        putc_ll('>');
+       putc_ll('6');
+       putc_ll('6');
+       putc_ll('6');

Aka added three 6 to be put out, which they are!

The side effect of my additional 3 6 chars is:

reset cmd, S1, wd 3, mw 0x44e00f00 0x1  and linux reboot are working fine now!
Additional LL debug output is there, console notification message, >, my sixes,
and it comes reliably up!

Kind Regards
Konstantin


-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-03 21:45                   ` Konstantin Kletschke
@ 2024-12-04  6:14                     ` Ahmad Fatoum
  2024-12-04 16:29                       ` Konstantin Kletschke
  0 siblings, 1 reply; 22+ messages in thread
From: Ahmad Fatoum @ 2024-12-04  6:14 UTC (permalink / raw)
  To: Konstantin Kletschke; +Cc: barebox

On 03.12.24 22:45, Konstantin Kletschke wrote:
> On Tue, Dec 03, 2024 at 09:28:39PM +0100, Ahmad Fatoum wrote:
>> See the patch I just Cc'd you on.
> 
> Worked like a charm!

Nice, you can reply with a

  Tested-by: Konstantin Kletschke <konstantin.kletschke@inside-m2m.de> # BBB

on the other patch, so it's picked up when the patch is applied.

> With vanilla code (one ">" in lowlevel.c) I get
> the LL output, the warm reset behaviour (stuck at S1, watchdog, reset
> cmd, linux reboot) is the same. As expected since its only debugging
> dealt with here.
> 
> Being the playful idiot child I am I was so happy this now working I wanted to
> see something more and added this in lowlevel.c:
> 
> @@ -135,6 +140,9 @@ static noinline int beaglebone_sram_init(void)
>         am33xx_enable_uart0_pin_mux();
>         omap_debug_ll_init();
>         putc_ll('>');
> +       putc_ll('6');
> +       putc_ll('6');
> +       putc_ll('6');
> 
> Aka added three 6 to be put out, which they are!
> 
> The side effect of my additional 3 6 chars is:
> 
> reset cmd, S1, wd 3, mw 0x44e00f00 0x1  and linux reboot are working fine now!
> Additional LL debug output is there, console notification message, >, my sixes,
> and it comes reliably up!

Very interesting. You can now try to move the 4 putc_ll`s to a later point in the
startup code and then see which line of barebox code needed the delay in front of
it.

Note that it's not safe to use puts_ll("string") before relocate_to_current_adr()
and setup_c() are called.

Cheers,
Ahmad


> 
> Kind Regards
> Konstantin
> 
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-03 18:46               ` Ahmad Fatoum
  2024-12-03 19:03                 ` Konstantin Kletschke
@ 2024-12-04 11:07                 ` Konstantin Kletschke
  2024-12-04 11:20                   ` Konstantin Kletschke
  1 sibling, 1 reply; 22+ messages in thread
From: Konstantin Kletschke @ 2024-12-04 11:07 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox

On Tue, Dec 03, 2024 at 07:46:56PM +0100, Ahmad Fatoum wrote:

> Anther thing, I wonder about is what configuration the PMIC has on affected
> boards and boards not affected. Can you use the I2C commands in barebox
> to read the PMIC register set and compare it between the affected and
> unaffected boards? Maybe they have different mask defaults?

This is an affected Board (warm restart freezing):

barebox@TI AM335x BeagleBone black:/ i2c_read -b0 -a 0x24 -r 0 -c 0x1e
0xe2 0x3e 0x01 0x01 0xb1 0x80 0xb2 0x01 0x00 0x00 0x04 0x00 0x7f 0x0c 0x18 0x08 0x08 0x06 0x09 0x38 0x26 0x3f 0x7f 0x00 0x03 0x15 0x5f 0x32 0x40 0x20
barebox@TI AM335x BeagleBone black:/ i2c_read -b0 -a 0x24 -r 0 -c 0x1e
0xe2 0x3e 0x00 0x01 0xb1 0x80 0xb2 0x01 0x00 0x00 0x04 0x00 0x7f 0x0c 0x18 0x08 0x08 0x06 0x09 0x38 0x26 0x3f 0x7f 0x00 0x03 0x15 0x5f 0x32 0x40 0x20



This is an unaffected Board (warm restart works):

barebox@TI AM335x BeagleBone black:/ i2c_read -b0 -a 0x24 -r 0 -c 0x1e
0xe2 0x3e 0x01 0x01 0xb1 0x80 0xb2 0x01 0x00 0x00 0x04 0x00 0x7f 0x0c 0x18 0x08 0x08 0x06 0x09 0x38 0x26 0x3f 0x7f 0x00 0x03 0x15 0x5f 0x32 0x40 0x20
barebox@TI AM335x BeagleBone black:/ i2c_read -b0 -a 0x24 -r 0 -c 0x1e
0xe2 0x3e 0x00 0x01 0xb1 0x80 0xb2 0x01 0x00 0x00 0x04 0x00 0x7f 0x0c 0x18 0x08 0x08 0x06 0x09 0x38 0x26 0x3f 0x7f 0x00 0x03 0x15 0x5f 0x32 0x40 0x20





I spot no difference, and both change INT from 1 to 0 on second read.

Kind Regards
Konstantin


-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-04 11:07                 ` Konstantin Kletschke
@ 2024-12-04 11:20                   ` Konstantin Kletschke
  0 siblings, 0 replies; 22+ messages in thread
From: Konstantin Kletschke @ 2024-12-04 11:20 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox

On Wed, Dec 04, 2024 at 12:07:44PM +0100, Konstantin Kletschke wrote:

> This is an unaffected Board (warm restart works):
> 
> barebox@TI AM335x BeagleBone black:/ i2c_read -b0 -a 0x24 -r 0 -c 0x1e
> 0xe2 0x3e 0x01 0x01 0xb1 0x80 0xb2 0x01 0x00 0x00 0x04 0x00 0x7f 0x0c 0x18 0x08 0x08 0x06 0x09 0x38 0x26 0x3f 0x7f 0x00 0x03 0x15 0x5f 0x32 0x40 0x20
> barebox@TI AM335x BeagleBone black:/ i2c_read -b0 -a 0x24 -r 0 -c 0x1e
> 0xe2 0x3e 0x00 0x01 0xb1 0x80 0xb2 0x01 0x00 0x00 0x04 0x00 0x7f 0x0c 0x18 0x08 0x08 0x06 0x09 0x38 0x26 0x3f 0x7f 0x00 0x03 0x15 0x5f 0x32 0x40 0x20

This is the same BBB, unaffected, read out with its shipped u-boot
(U-Boot 2019.04-00002-g31a8ae0206):

=> i2c md 0x24 0x0 0x1e
0000: e2 3f 01 01 b1 80 b2 01 00 00 04 00 7f 0c 18 11    .?..............
0010: 08 06 09 38 26 3f 7f 00 03 15 5f 32 40 20    ...8&?...._2@
=> i2c md 0x24 0x0 0x1e
0000: e2 3f 00 01 b1 80 b2 01 00 00 04 00 7f 0c 18 11    .?..............
0010: 08 06 09 38 26 3f 7f 00 03 15 5f 32 40 20    ...8&?...._2@
=> i2c md 0x24 0x0 0x1e
0000: e2 3f 00 01 b1 80 b2 01 00 00 04 00 7f 0c 18 11    .?..............
0010: 08 06 09 38 26 3f 7f 00 03 15 5f 32 40 20    ...8&?...._2@

Command issued three times, in address 0x2 also transition from 1 to 0
on second read.

Differences in 
PPATH (barebox: 0x3e, u-boot: 0x3f) and
DEFDCDC2 (barebox: 0x08, u-boot: 0x11)

Wondering if this could be significant...

Regards
Konsti


-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-04  6:14                     ` Ahmad Fatoum
@ 2024-12-04 16:29                       ` Konstantin Kletschke
  2024-12-10 21:52                         ` Ahmad Fatoum
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Kletschke @ 2024-12-04 16:29 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox

On Wed, Dec 04, 2024 at 07:14:17AM +0100, Ahmad Fatoum wrote:

> Very interesting. You can now try to move the 4 putc_ll`s to a later point in the
> startup code and then see which line of barebox code needed the delay in front of
> it.

Dear Ahmad, I will try my best to put across what I did to the code and
where I get exactly stuck.

There is lowlevel.c with beaglebone_sram_init():

	am33xx_uart_soft_reset((void *)AM33XX_UART0_BASE);
	am33xx_enable_uart0_pin_mux();
	omap_debug_ll_init();
	putc_ll('>');
	putc_ll('6');
//	putc_ll('6');

	barebox_arm_entry(0x80000000, sdram_size, fdt);


Then I went to uncompress.c where there is barebox_pbl_start():

	void *handoff_data;

	putc_ll('A');
	/* piggy data is not relocated, so determine the bounds now */
	pg_start = runtime_address(input_data);
	pg_end = runtime_address(input_data_end);

	/*
	 * If we run from inside the memory just relocate the binary
	 * to the current address. Otherwise it may be a readonly location.
	 * Copy and relocate to the start of the memory in this case.
	 */
	if (pc > membase && pc - membase < memsize)
		relocate_to_current_adr();
	else
		relocate_to_adr(membase);

	pg_len = pg_end - pg_start;
	uncompressed_len = get_unaligned((const u32 *)(pg_start + pg_len - 4));

	putc_ll('B');
	setup_c();

	putc_ll('C');
	pr_debug("memory at 0x%08lx, size 0x%08lx\n", membase, memsize);

	putc_ll('D');
	if (IS_ENABLED(CONFIG_MMU))
		mmu_early_enable(membase, memsize);

In mmu_32.c there is mmu_early_enable():

	set_ttbr(ttb);

	putc_ll('E');
	/* For the XN bit to take effect, we can't be using DOMAIN_MANAGER. */
	if (cpu_architecture() >= CPU_ARCH_ARMv7)
		set_domain(DOMAIN_CLIENT);
	else
		set_domain(DOMAIN_MANAGER);

	putc_ll('F');
	/*
	 * This marks the whole address space as uncachable as well as
	 * unexecutable if possible
	 */
	create_flat_mapping();

	putc_ll('G');
	/* maps main memory as cachable */
	early_remap_range(membase, memsize - OPTEE_SIZE, MAP_CACHED);
	putc_ll('H');
	early_remap_range(membase + memsize - OPTEE_SIZE, OPTEE_SIZE, MAP_UNCACHED);
	putc_ll('I');
	early_remap_range(PAGE_ALIGN_DOWN((uintptr_t)_stext), PAGE_ALIGN(_etext - _stext), MAP_CACHED);
	putc_ll('J');

	__mmu_cache_on();
	putc_ll('K');

For early_remap_range() I end up in __arch_remap_range() there is:

	u32 pte_flags, pmd_flags;
	putc_ll('-');
	uint32_t *ttb = get_ttb();

	putc_ll('|');
	BUG_ON(!IS_ALIGNED(virt_addr, PAGE_SIZE));
	putc_ll('!');
	BUG_ON(!IS_ALIGNED(phys_addr, PAGE_SIZE));

	putc_ll('_');
	pte_flags = get_pte_flags(map_type);

Well, lets mark get_ttb():

static inline uint32_t *get_ttb(void)
{
	putc_ll('%');
	/* Clear unpredictable bits [13:0] */
	return (uint32_t *)(get_ttbr() & ~0x3fff);
}

I _think_ this is the critical path, I have more putc_ll() inserted, but
they are not important.
If by any chance it is better readable for anyone I could provide a
complete diff, of course.

So, when I power up, I get:

2>6ABCDEF%G-%|!_H-%|!_I-%|!_%%%%JKLMNOPQZSwitch to console [cs0]

before the banner.

When I reset, S1, blabla, I get:

>6ABCDEF%G-%|

So I assume it dies at 

BUG_ON(!IS_ALIGNED(virt_addr, PAGE_SIZE));

at _arch_remap_range() in mmu_32.c.

Which yields to get_ttbr(void) in mmu_32.h which contains something like 
asm volatile ("mrc p15, 0, %0, c2, c0, 0" : "=r"(ttb));

*EEEK*

Now I triple check with the second 6 enabled in lowlevel.c, no I change
it to 5, so:

	omap_debug_ll_init();
	putc_ll('>');
	putc_ll('6');
	putc_ll('5 ');

	barebox_arm_entry(0x80000000, sdram_size, fdt);

Powerup:
2>65ABCDEF%G-%|!_H-%|!_I-%|!_%%%%JKLMNOPQZSwitch to console [cs0]
reset:
>65ABCDEF%G-%|!_H-%|!_I-%|!_%%%%JKLMNOPQZSwitch to console [cs0]
S1:
>65ABCDEF%G-%|!_H-%|!_I-%|!_%%%%JKLMNOPQZSwitch to console [cs0]
mw 0x44e00f00 0x1:
�>65ABCDEF%G-%|!_H-%|!_I-%|!_%%%%JKLMNOPQZSwitch to console [cs0]
wd 1:
>65ABCDEF%G-%|!_H-%|!_I-%|!_%%%%JKLMNOPQZSwitch to console [cs0]
Booting linux, entering reboot there:
reboot: Restarting system
>65ABCDEF%G-%|!_H-%|!_I-%|!_%%%%JKLMNOPQZSwitch to console [cs0]

So each warm restart method gives me a proper reboot. 
With an additional putc_ll() in lowlevel.c in beaglebone_sram_init().
The later debug putc_ll() have no influence on starting not starting.

Kind regards
Konsti

-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-04 16:29                       ` Konstantin Kletschke
@ 2024-12-10 21:52                         ` Ahmad Fatoum
  2024-12-11 14:52                           ` Konstantin Kletschke
  0 siblings, 1 reply; 22+ messages in thread
From: Ahmad Fatoum @ 2024-12-10 21:52 UTC (permalink / raw)
  To: Konstantin Kletschke; +Cc: barebox

Hi,

On 04.12.24 17:29, Konstantin Kletschke wrote:
> On Wed, Dec 04, 2024 at 07:14:17AM +0100, Ahmad Fatoum wrote:
> 
>> Very interesting. You can now try to move the 4 putc_ll`s to a later point in the
>> startup code and then see which line of barebox code needed the delay in front of
>> it.


> I _think_ this is the critical path, I have more putc_ll() inserted, but
> they are not important.
> If by any chance it is better readable for anyone I could provide a
> complete diff, of course.
> 
> So, when I power up, I get:
> 
> 2>6ABCDEF%G-%|!_H-%|!_I-%|!_%%%%JKLMNOPQZSwitch to console [cs0]
> 
> before the banner.
> 
> When I reset, S1, blabla, I get:
> 
>> 6ABCDEF%G-%|
> 
> So I assume it dies at 
> 
> BUG_ON(!IS_ALIGNED(virt_addr, PAGE_SIZE));

You can print hex numbers with puthex_ll(), although by now you are
already relocated, so you can add

  pbl_set_putc((void (*)(void *, int))debug_ll_ns16550_putc, uart0);

after omap_debug_ll_init(), enable CONFIG_DEBUG_PBL and then you can
use normal printf and also see the panic message if the BUG() indeed
triggers.

> So each warm restart method gives me a proper reboot. 
> With an additional putc_ll() in lowlevel.c in beaglebone_sram_init().
> The later debug putc_ll() have no influence on starting not starting.

Getting stuck inside the BUG_ON doesn't make much sense. It may be interesting
to find out what the value of virt_addr is.

I suspect the RAM controller itself may be acting funky after a warm reboot
and letting some time go by fixes that :/

Cheers,
Ahmad

> 
> Kind regards
> Konsti
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-10 21:52                         ` Ahmad Fatoum
@ 2024-12-11 14:52                           ` Konstantin Kletschke
  2024-12-20 11:05                             ` Konstantin Kletschke
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Kletschke @ 2024-12-11 14:52 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox

Hi ;)

On Tue, Dec 10, 2024 at 10:52:04PM +0100, Ahmad Fatoum wrote:

> You can print hex numbers with puthex_ll(), although by now you are
> already relocated, so you can add
> 
>   pbl_set_putc((void (*)(void *, int))debug_ll_ns16550_putc, uart0);
> 
> after omap_debug_ll_init(), enable CONFIG_DEBUG_PBL and then you can
> use normal printf and also see the panic message if the BUG() indeed
> triggers.

I did enable CONFIG_PBL_CONSOLE and CONFIG_DEBUG_PBL in both barebox.bin
and MLO.

I get this:

2>6ABCuncompress.c: memory at 0x80000000, size 0x20000000
Dmmu: enabling MMU, ttb @ 0x9ffe0000
EF%G-%|!_H-%|!_I-%|!_%%%%JKLMendmem                = 0xa0000000
arm_mem_scratch       = 0x9fff8000+0x00008000
arm_mem_stack         = 0x9fff0000+0x00008000
arm_mem_ttb           = 0x9ffe0000+0x00010000
arm_mem_barebox_image = 0x9fe00000+0x00200000
arm_mem_early_malloc  = 0x9fde0000+0x00020000
membase               = 0x80000000+0x20000000
uncompress.c: uncompressing barebox binary at 0x402f65a0 (size 0x0000e3c8) to 0x9fe00000 (uncompressed size: 0x00018310)
NOPuncompress.c: jumping to uncompressed image at 0x9fe00001
QZSwitch to console [cs0]
[...] normal startup with more debug output continuiung.


I left the single char debug sequence char output stuff I inserted as it
was last time.

Warm restart gives me this:

barebox@TI AM335x BeagleBone black:/ reset
>6ABCuncompress.c: memory at 0x8f001b00, size 0x9f00b500
Dmmu: enabling MMU, ttb @ 0x2dfe0000
EF%
[DEAD]

This looks like a tiny bit different than last time without the new PBL
CONFIG options enabled. It reaches 

static inline uint32_t *get_ttb(void)
{
        putc_ll('%');
        /* Clear unpredictable bits [13:0] */
        return (uint32_t *)(get_ttbr() & ~0x3fff);
}

but not 

        uint32_t *ttb = get_ttb();

        putc_ll('|');

the line after this.
Last time looked like it hit the BUG_ON which I thout stopped the
execution because of messed up addresses.

Here one sees addresses are totallfy messed up in uncompress.c already.

Still it is possible to make soft start working by adding an additional
char in lowlevel.c:

        putc_ll('>');
        putc_ll('6');
        putc_ll('5');

makes it work,

        putc_ll('>');
        putc_ll('6');

or less gives the error.

I replaced all three chars with a hand crafted __udelay in lowlevel.c
(but left debugging enabled), makes everything working fine, too.



I was not able to add the line 
pbl_set_putc((void (*)(void *, int))debug_ll_ns16550_putc, uart0);
to lowlevel.c. I admit I asked chatgpt to explain to me, if this
still is valid C, wow! :-) 
Adding it gives an error:

arch/arm/boards/beaglebone/lowlevel.c:161:2: error: implicit declaration of function `pbl_set_putc´; did you mean `__set_bit´? [-Werror=implicit-function-declaration]
  pbl_set_putc((void (*)(void *, int))debug_ll_ns16550_putc, uart0);
  ^~~~~~~~~~~~
  __set_bit
arch/arm/boards/beaglebone/lowlevel.c:161:61: error: `uart0´ undeclared (first use in this function)
  pbl_set_putc((void (*)(void *, int))debug_ll_ns16550_putc, uart0);


> I suspect the RAM controller itself may be acting funky after a warm reboot
> and letting some time go by fixes that :/


Yes. But the PLL lock bits are properly set I checked already and the
barebox code does that to...

Regards
Konsti



-- 
INSIDE M2M GmbH
Konstantin Kletschke
Berenbosteler Straße 76 B
30823 Garbsen

Telefon: +49 (0) 5137 90950136
Mobil: +49 (0) 151 15256238
Fax: +49 (0) 5137 9095010

konstantin.kletschke@inside-m2m.de
http://www.inside-m2m.de 

Geschäftsführung: Michael Emmert, Derek Uhlig
HRB: 111204, AG Hannover




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Reset on Beaglebone Black has become unreliable/broken
  2024-12-11 14:52                           ` Konstantin Kletschke
@ 2024-12-20 11:05                             ` Konstantin Kletschke
  0 siblings, 0 replies; 22+ messages in thread
From: Konstantin Kletschke @ 2024-12-20 11:05 UTC (permalink / raw)
  To: Ahmad Fatoum; +Cc: barebox

I found the minimum change required to make barebox warm restart every
time I hot S1, write warm restart register, type reset or linux kernel
reboot.
I needed to figure out to realize in my yocto the MLO stage is another
package to test my changes in this vanilla defconfig part carefully.

My barebox-pbl stage package calls the defconfig am335x_mlo_defconfig, I
apply only this patch on top on my 2022.04 version:

diff --git a/arch/arm/boards/beaglebone/lowlevel.c b/arch/arm/boards/beaglebone/lowlevel.c
index 544e396e03..329d7a9150 100644
--- a/arch/arm/boards/beaglebone/lowlevel.c
+++ b/arch/arm/boards/beaglebone/lowlevel.c
@@ -97,6 +97,12 @@ extern char __dtb_z_am335x_boneblack_start[];
 extern char __dtb_z_am335x_bone_common_start[];
 extern char __dtb_z_am335x_bone_start[];

+static void __udelay(int us)
+{
+	volatile int i;
+	for (i = 0; i < us * 3; i++);
+}
+
 /**
  * @brief The basic entry point for board initialization.
  *
@@ -142,6 +148,7 @@ static noinline int beaglebone_sram_init(void)
 	omap_uart_lowlevel_init((void *)AM33XX_UART0_BASE);
 	putc_ll('>');

+	__udelay(1000);
 	barebox_arm_entry(0x80000000, sdram_size, fdt);
 }


This delay loop (side quest: how do I calculate how long it waits?)
on its own with no changes in CONFIG or debugging or whatsoever
fixes everything.

Regards
Konstantin



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-12-20 11:15 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-28  9:07 Reset on Beaglebone Black has become unreliable/broken Konstantin Kletschke
2024-11-28  9:23 ` Ahmad Fatoum
2024-11-28  9:46   ` Konstantin Kletschke
2024-11-28 11:18     ` Ahmad Fatoum
2024-11-28 12:02       ` Konstantin Kletschke
2024-11-28 15:25         ` Konstantin Kletschke
2024-12-02 12:41         ` Ahmad Fatoum
2024-12-02 14:15           ` Konstantin Kletschke
2024-12-03 18:28             ` Ahmad Fatoum
2024-12-03 18:51               ` Konstantin Kletschke
2024-12-03 20:28                 ` Ahmad Fatoum
2024-12-03 21:45                   ` Konstantin Kletschke
2024-12-04  6:14                     ` Ahmad Fatoum
2024-12-04 16:29                       ` Konstantin Kletschke
2024-12-10 21:52                         ` Ahmad Fatoum
2024-12-11 14:52                           ` Konstantin Kletschke
2024-12-20 11:05                             ` Konstantin Kletschke
2024-12-03 18:34             ` Konstantin Kletschke
2024-12-03 18:46               ` Ahmad Fatoum
2024-12-03 19:03                 ` Konstantin Kletschke
2024-12-04 11:07                 ` Konstantin Kletschke
2024-12-04 11:20                   ` Konstantin Kletschke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox