mail archive of the barebox mailing list
 help / color / mirror / Atom feed
* Boot hangs during sdhci_transfer_data_dma
@ 2022-06-20 14:33 Robin van der Gracht
  2022-06-21  7:46 ` Sascha Hauer
  0 siblings, 1 reply; 4+ messages in thread
From: Robin van der Gracht @ 2022-06-20 14:33 UTC (permalink / raw)
  To: Barebox; +Cc: Andrey Smirnov

Hi,

Today I tried to run barebox with CONFIG_KEYBOARD_GPIO=y added to my config.
and noticed my board hangs during boot. When I modify the probe function to
run without registering the poller[1] it boots as expected.

I started digging into the code to see how far the boot gets when I do
register the poller. I found that Barebox hangs in a do/while loop in
sdhci_transfer_data_dma[2].

The contents of the interrupt status (SDHCI_INT_STATUS) is 0 and stays that
way forever trapping the process in the loop.

Call stack:

initcall -> barebox_of_populate
   state_probe                                          drivers/misc/state.c
     state_new_from_node                                common/state/state.c
       of_find_path_by_node                             drivers/of/of_path.c
         __of_find_path                                 drivers/of/of_path.c
           device_detect                                drivers/base/driver.c
             mci_detect_card                            drivers/mci/mci-core.c
               mci_card_probe                           drivers/mci/mci-core.c
                 mci_startup                            drivers/mci/mci-core.c
                   mci_startup_mmc                      drivers/mci/mci-core.c
                     mmc_compare_ext_csds               drivers/mci/mci-core.c
                       mci_send_ext_csd                 drivers/mci/mci-core.c
                         mci_send_cmd                   drivers/mci/mci-core.c
                           esdhc_send_cmd               
drivers/mci/imx-esdhc.c
                             __esdhc_send_cmd    
drivers/mci/imx-esdhc-common.c
                               sdhci_transfer_data_dma  drivers/mci/sdhci.c


I'm not sure how this happens. It's not the first transfer taking place. I
figured that mayby the poller[1] just adds some cpu load that opens up a
window for this to occur.

Maybe something else cleared the status register right before we entered the
loop. Thats when I spotted this read/write construction[3]. It's executed
right before we enter the do/while loop and (over)writes to the irq status
register.

I removed the line with the write command[3] and my board boots as expected.
Why are we (over)writing the status register right after reading it?

Other theories on how this could occur are also very welcome :)

- Robin


1: 
https://git.pengutronix.de/cgit/barebox/tree/drivers/input/gpio_keys.c#n168
2: https://git.pengutronix.de/cgit/barebox/tree/drivers/mci/sdhci.c#n167
3: 
https://git.pengutronix.de/cgit/barebox/tree/drivers/mci/imx-esdhc-common.c#n179



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Boot hangs during sdhci_transfer_data_dma
  2022-06-20 14:33 Boot hangs during sdhci_transfer_data_dma Robin van der Gracht
@ 2022-06-21  7:46 ` Sascha Hauer
  2022-06-21  9:32   ` Robin van der Gracht
  2022-06-21 10:07   ` Robin van der Gracht
  0 siblings, 2 replies; 4+ messages in thread
From: Sascha Hauer @ 2022-06-21  7:46 UTC (permalink / raw)
  To: Robin van der Gracht; +Cc: Barebox, Andrey Smirnov

Hi Robin,

On Mon, Jun 20, 2022 at 04:33:02PM +0200, Robin van der Gracht wrote:
> Hi,
> 
> Today I tried to run barebox with CONFIG_KEYBOARD_GPIO=y added to my config.
> and noticed my board hangs during boot. When I modify the probe function to
> run without registering the poller[1] it boots as expected.
> 
> I started digging into the code to see how far the boot gets when I do
> register the poller. I found that Barebox hangs in a do/while loop in
> sdhci_transfer_data_dma[2].
> 
> The contents of the interrupt status (SDHCI_INT_STATUS) is 0 and stays that
> way forever trapping the process in the loop.
> 
> Call stack:
> 
> initcall -> barebox_of_populate
>   state_probe                                          drivers/misc/state.c
>     state_new_from_node                                common/state/state.c
>       of_find_path_by_node                             drivers/of/of_path.c
>         __of_find_path                                 drivers/of/of_path.c
>           device_detect                                drivers/base/driver.c
>             mci_detect_card                            drivers/mci/mci-core.c
>               mci_card_probe                           drivers/mci/mci-core.c
>                 mci_startup                            drivers/mci/mci-core.c
>                   mci_startup_mmc                      drivers/mci/mci-core.c
>                     mmc_compare_ext_csds               drivers/mci/mci-core.c
>                       mci_send_ext_csd                 drivers/mci/mci-core.c
>                         mci_send_cmd                   drivers/mci/mci-core.c
>                           esdhc_send_cmd
> drivers/mci/imx-esdhc.c
>                             __esdhc_send_cmd
> drivers/mci/imx-esdhc-common.c
>                               sdhci_transfer_data_dma  drivers/mci/sdhci.c
> 
> 
> I'm not sure how this happens. It's not the first transfer taking place. I
> figured that mayby the poller[1] just adds some cpu load that opens up a
> window for this to occur.
> 
> Maybe something else cleared the status register right before we entered the
> loop. Thats when I spotted this read/write construction[3]. It's executed
> right before we enter the do/while loop and (over)writes to the irq status
> register.
> 
> I removed the line with the write command[3] and my board boots as expected.
> Why are we (over)writing the status register right after reading it?

The idea is likely that we clear the interrupts that we just handled. It
seems by the time the status register is overwritten the DMA transfer is
already ongoing, and in your case even already done. We should only ever
clear the bits we have handled, like sdhci_transfer_data_dma() does with

	sdhci_write32(sdhci, SDHCI_INT_STATUS, SDHCI_INT_DMA);

Apart from this line the code never checks the same bit twice, so
clearing anything shoudn't be necessary. Clearing the status register
once either at the start or the end of the function should be enough.

I think the right thing to do is just to remove the erroneous status
register write like you already did.

Sascha

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Boot hangs during sdhci_transfer_data_dma
  2022-06-21  7:46 ` Sascha Hauer
@ 2022-06-21  9:32   ` Robin van der Gracht
  2022-06-21 10:07   ` Robin van der Gracht
  1 sibling, 0 replies; 4+ messages in thread
From: Robin van der Gracht @ 2022-06-21  9:32 UTC (permalink / raw)
  To: Sascha Hauer; +Cc: Barebox, Andrey Smirnov

Hi Sascha,

On 2022-06-21 09:46, Sascha Hauer wrote:
> Hi Robin,
> 
> On Mon, Jun 20, 2022 at 04:33:02PM +0200, Robin van der Gracht wrote:
>> Hi,
>> 
>> Today I tried to run barebox with CONFIG_KEYBOARD_GPIO=y added to my 
>> config.
>> and noticed my board hangs during boot. When I modify the probe function to
>> run without registering the poller[1] it boots as expected.
>> 
>> I started digging into the code to see how far the boot gets when I do
>> register the poller. I found that Barebox hangs in a do/while loop in
>> sdhci_transfer_data_dma[2].
>> 
>> The contents of the interrupt status (SDHCI_INT_STATUS) is 0 and stays that
>> way forever trapping the process in the loop.
>> 
>> Call stack:
>> 
>> initcall -> barebox_of_populate
>>   state_probe                                          drivers/misc/state.c
>>     state_new_from_node                                common/state/state.c
>>       of_find_path_by_node                             drivers/of/of_path.c
>>         __of_find_path                                 drivers/of/of_path.c
>>           device_detect                                
>> drivers/base/driver.c
>>             mci_detect_card                            
>> drivers/mci/mci-core.c
>>               mci_card_probe                           
>> drivers/mci/mci-core.c
>>                 mci_startup                            
>> drivers/mci/mci-core.c
>>                   mci_startup_mmc                      
>> drivers/mci/mci-core.c
>>                     mmc_compare_ext_csds               
>> drivers/mci/mci-core.c
>>                       mci_send_ext_csd                 
>> drivers/mci/mci-core.c
>>                         mci_send_cmd                   
>> drivers/mci/mci-core.c
>>                           esdhc_send_cmd
>> drivers/mci/imx-esdhc.c
>>                             __esdhc_send_cmd
>> drivers/mci/imx-esdhc-common.c
>>                               sdhci_transfer_data_dma  drivers/mci/sdhci.c
>> 
>> 
>> I'm not sure how this happens. It's not the first transfer taking place. I
>> figured that mayby the poller[1] just adds some cpu load that opens up a
>> window for this to occur.
>> 
>> Maybe something else cleared the status register right before we entered 
>> the
>> loop. Thats when I spotted this read/write construction[3]. It's executed
>> right before we enter the do/while loop and (over)writes to the irq status
>> register.
>> 
>> I removed the line with the write command[3] and my board boots as 
>> expected.
>> Why are we (over)writing the status register right after reading it?
> 
> The idea is likely that we clear the interrupts that we just handled. It
> seems by the time the status register is overwritten the DMA transfer is
> already ongoing, and in your case even already done.

I can confirm this. When the data transfer is still ongoing the status
register holds 0x000000001 (SDHCI_INT_CMD_COMPLETE). A few lines above
the write there is a busy wait for this bit to be set.

When my board hangs the status register holds 0x0000000b at the timen it is
cleared. Which means the DMA engine has stopped on a buffer boundary.
Apparently this can sometimes happen shortly after starting.

> We should only ever
> clear the bits we have handled, like sdhci_transfer_data_dma() does with
> 
> 	sdhci_write32(sdhci, SDHCI_INT_STATUS, SDHCI_INT_DMA);

Ack. This would suffice:

sdhci_write32(&host->sdhci, SDHCI_INT_STATUS, SDHCI_INT_CMD_COMPLETE);


> 
> Apart from this line the code never checks the same bit twice, so
> clearing anything shoudn't be necessary. Clearing the status register
> once either at the start or the end of the function should be enough.
> 
> I think the right thing to do is just to remove the erroneous status
> register write like you already did.

I'll submit a patch that removes the write. Thank you for thinking along.

- Robin



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Boot hangs during sdhci_transfer_data_dma
  2022-06-21  7:46 ` Sascha Hauer
  2022-06-21  9:32   ` Robin van der Gracht
@ 2022-06-21 10:07   ` Robin van der Gracht
  1 sibling, 0 replies; 4+ messages in thread
From: Robin van der Gracht @ 2022-06-21 10:07 UTC (permalink / raw)
  To: Sascha Hauer; +Cc: Barebox, Andrey Smirnov

On 2022-06-21 09:46, Sascha Hauer wrote:
> Hi Robin,
...
> We should only ever
> clear the bits we have handled, like sdhci_transfer_data_dma() does with
> 
> 	sdhci_write32(sdhci, SDHCI_INT_STATUS, SDHCI_INT_DMA);

I just noticed that the tegra-sdmmc mci driver might have the same issue.

https://git.pengutronix.de/cgit/barebox/tree/drivers/mci/tegra-sdmmc.c#n149

It can prevent the following conditional from ever evaluating true trapping
the process in the while loop.
https://git.pengutronix.de/cgit/barebox/tree/drivers/mci/tegra-sdmmc.c#n203

This driver clears the status register on error (not on start/end of 
function)
with an old value 'val' which can be lacking status bits that popped up
along the way...

I can't test code changes for this platform.

- Robin



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-06-21 10:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 14:33 Boot hangs during sdhci_transfer_data_dma Robin van der Gracht
2022-06-21  7:46 ` Sascha Hauer
2022-06-21  9:32   ` Robin van der Gracht
2022-06-21 10:07   ` Robin van der Gracht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox