mail archive of the barebox mailing list
 help / color / mirror / Atom feed
* [PATCH 1/2] ARM: mmu: optimize dma_alloc_coherent for cache-coherent DMA masters
@ 2026-01-15 12:05 Ahmad Fatoum
  2026-01-15 12:05 ` [PATCH 2/2] virtio: use DMA coherent APIs Ahmad Fatoum
  2026-01-19 10:20 ` [PATCH 1/2] ARM: mmu: optimize dma_alloc_coherent for cache-coherent DMA masters Sascha Hauer
  0 siblings, 2 replies; 3+ messages in thread
From: Ahmad Fatoum @ 2026-01-15 12:05 UTC (permalink / raw)
  To: barebox; +Cc: Ahmad Fatoum

If a device is DMA-capable and cache-coherent, it can be considerably
faster to keep shared memory cached, instead of mapping it uncached
unconditionally like we currently do.

This was very noticeable when using Virt I/O with KVM acceleration as
described in commit 3ebd05809a49 ("virtio: don't use DMA API unless
required").

In preparation for simplifying the code in the aforementioned commit,
consult dev_is_dma_coherent() before doing cache maintenance.

Signed-off-by: Ahmad Fatoum <a.fatoum@barebox.org>
---
 arch/arm/cpu/mmu-common.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/arm/cpu/mmu-common.c b/arch/arm/cpu/mmu-common.c
index a1431c0ff461..2b22ab47cac8 100644
--- a/arch/arm/cpu/mmu-common.c
+++ b/arch/arm/cpu/mmu-common.c
@@ -50,9 +50,11 @@ void *dma_alloc_map(struct device *dev,
 		*dma_handle = (dma_addr_t)ret;
 
 	memset(ret, 0, size);
-	dma_flush_range(ret, size);
 
-	remap_range(ret, size, map_type);
+	if (!dev_is_dma_coherent(dev)) {
+		dma_flush_range(ret, size);
+		remap_range(ret, size, map_type);
+	}
 
 	return ret;
 }
@@ -70,8 +72,8 @@ void *dma_alloc_coherent(struct device *dev,
 void dma_free_coherent(struct device *dev,
 		       void *mem, dma_addr_t dma_handle, size_t size)
 {
-	size = PAGE_ALIGN(size);
-	remap_range(mem, size, MAP_CACHED);
+	if (!dev_is_dma_coherent(dev))
+		remap_range(mem, PAGE_ALIGN(size), MAP_CACHED);
 
 	free(mem);
 }
-- 
2.47.3




^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 2/2] virtio: use DMA coherent APIs
  2026-01-15 12:05 [PATCH 1/2] ARM: mmu: optimize dma_alloc_coherent for cache-coherent DMA masters Ahmad Fatoum
@ 2026-01-15 12:05 ` Ahmad Fatoum
  2026-01-19 10:20 ` [PATCH 1/2] ARM: mmu: optimize dma_alloc_coherent for cache-coherent DMA masters Sascha Hauer
  1 sibling, 0 replies; 3+ messages in thread
From: Ahmad Fatoum @ 2026-01-15 12:05 UTC (permalink / raw)
  To: barebox; +Cc: Ahmad Fatoum

This requires that barebox can actually tell apart DMA coherent devices
from non-coherent ones, so select OF_DMA_COHERENCY where applicable.

Signed-off-by: Ahmad Fatoum <a.fatoum@barebox.org>
---
 drivers/virtio/Kconfig       |  1 +
 drivers/virtio/virtio.c      |  4 +++
 drivers/virtio/virtio_ring.c | 62 +++++++-----------------------------
 3 files changed, 16 insertions(+), 51 deletions(-)

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index ecf66987b3ed..e39ec863f003 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config VIRTIO
 	bool
+	select OF_DMA_COHERENCY if OF
 	help
 	  This option is selected by any driver which implements the virtio
 	  bus, such as CONFIG_VIRTIO_MMIO, CONFIG_VIRTIO_PCI.
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 4abf551a2834..262c09c9fe58 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -138,6 +138,10 @@ int virtio_finalize_features(struct virtio_device *dev)
 			return -ENODEV;
 		}
 
+		/* When this changes in future with support for IOMMUs in
+		 * emulation, make sure to adapt vring_alloc_queue(), so
+		 * it ignores IOMMUs if virtio_has_dma_quirk()
+		 */
 		if (!virtio_has_feature(dev, VIRTIO_F_ACCESS_PLATFORM)) {
 			dev_warn(&dev->dev,
 				 "device must provide VIRTIO_F_ACCESS_PLATFORM\n");
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 8b6469f54d2a..8953d43c440c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -23,9 +23,14 @@
 #define vq_info(vq, fmt, ...) \
 	dev_info(&vq->vdev->dev, fmt, ##__VA_ARGS__)
 
+static inline struct device *virtio_dma_dev(const struct virtio_device *vdev)
+{
+	return vdev->dev.parent;
+}
+
 static inline struct device *vring_dma_dev(const struct virtqueue *vq)
 {
-	return vq->vdev->dev.parent;
+	return virtio_dma_dev(vq->vdev);
 }
 
 /* Map one sg entry. */
@@ -330,11 +335,6 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
  * On most systems with virtio, physical addresses match bus addresses,
  * and it _shouldn't_ particularly matter whether we use the DMA API.
  *
- * However, barebox' dma_alloc_coherent doesn't yet take a device pointer
- * as argument, so even for dma-coherent devices, the virtqueue is mapped
- * uncached on ARM. This has considerable impact on the Virt I/O performance,
- * so we really want to avoid using the DMA API if possible for the time being.
- *
  * On some systems, including Xen and any system with a physical device
  * that speaks virtio behind a physical IOMMU, we must use the DMA API
  * for virtio DMA to work at all.
@@ -344,60 +344,20 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
  * ignores the IOMMU, so we must either pretend that the IOMMU isn't
  * there or somehow map everything as the identity.
  *
- * For the time being, we preserve historic behavior and bypass the DMA
- * API.
- *
- * TODO: install a per-device DMA ops structure that does the right thing
- * taking into account all the above quirks, and use the DMA API
- * unconditionally on data path.
+ * As we do not support IOMMUs yet amd dma_alloc_cohrent takes a device
+ * pointer that enables us to do cached DMA, just use the DMA API
+ * unconditionally for now.
  */
-
-static bool vring_use_dma_api(const struct virtio_device *vdev)
-{
-	return !virtio_has_dma_quirk(vdev);
-}
-
 static void *vring_alloc_queue(struct virtio_device *vdev,
 			       size_t size, dma_addr_t *dma_handle)
 {
-	if (vring_use_dma_api(vdev)) {
-		return dma_alloc_coherent(DMA_DEVICE_BROKEN, size, dma_handle);
-	} else {
-		void *queue = memalign(PAGE_SIZE, PAGE_ALIGN(size));
-
-		if (queue) {
-			phys_addr_t phys_addr = virt_to_phys(queue);
-			*dma_handle = (dma_addr_t)phys_addr;
-
-			memset(queue, 0x00, PAGE_ALIGN(size));
-
-			/*
-			 * Sanity check: make sure we dind't truncate
-			 * the address.  The only arches I can find that
-			 * have 64-bit phys_addr_t but 32-bit dma_addr_t
-			 * are certain non-highmem MIPS and x86
-			 * configurations, but these configurations
-			 * should never allocate physical pages above 32
-			 * bits, so this is fine.  Just in case, throw a
-			 * warning and abort if we end up with an
-			 * unrepresentable address.
-			 */
-			if (WARN_ON_ONCE(*dma_handle != phys_addr)) {
-				free(queue);
-				return NULL;
-			}
-		}
-		return queue;
-	}
+	return dma_alloc_coherent(virtio_dma_dev(vdev), size, dma_handle);
 }
 
 static void vring_free_queue(struct virtio_device *vdev,
 			     size_t size, void *queue, dma_addr_t dma_handle)
 {
-	if (vring_use_dma_api(vdev))
-		dma_free_coherent(DMA_DEVICE_BROKEN, queue, dma_handle, size);
-	else
-		free(queue);
+	dma_free_coherent(virtio_dma_dev(vdev), queue, dma_handle, size);
 }
 
 struct virtqueue *vring_create_virtqueue(unsigned int index, unsigned int num,
-- 
2.47.3




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/2] ARM: mmu: optimize dma_alloc_coherent for cache-coherent DMA masters
  2026-01-15 12:05 [PATCH 1/2] ARM: mmu: optimize dma_alloc_coherent for cache-coherent DMA masters Ahmad Fatoum
  2026-01-15 12:05 ` [PATCH 2/2] virtio: use DMA coherent APIs Ahmad Fatoum
@ 2026-01-19 10:20 ` Sascha Hauer
  1 sibling, 0 replies; 3+ messages in thread
From: Sascha Hauer @ 2026-01-19 10:20 UTC (permalink / raw)
  To: barebox, Ahmad Fatoum


On Thu, 15 Jan 2026 13:05:52 +0100, Ahmad Fatoum wrote:
> If a device is DMA-capable and cache-coherent, it can be considerably
> faster to keep shared memory cached, instead of mapping it uncached
> unconditionally like we currently do.
> 
> This was very noticeable when using Virt I/O with KVM acceleration as
> described in commit 3ebd05809a49 ("virtio: don't use DMA API unless
> required").
> 
> [...]

Applied, thanks!

[1/2] ARM: mmu: optimize dma_alloc_coherent for cache-coherent DMA masters
      https://git.pengutronix.de/cgit/barebox/commit/?id=aefa14324910 (link may not be stable)
[2/2] virtio: use DMA coherent APIs
      https://git.pengutronix.de/cgit/barebox/commit/?id=c56e7af5fa98 (link may not be stable)

Best regards,
-- 
Sascha Hauer <s.hauer@pengutronix.de>




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-01-19 10:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-15 12:05 [PATCH 1/2] ARM: mmu: optimize dma_alloc_coherent for cache-coherent DMA masters Ahmad Fatoum
2026-01-15 12:05 ` [PATCH 2/2] virtio: use DMA coherent APIs Ahmad Fatoum
2026-01-19 10:20 ` [PATCH 1/2] ARM: mmu: optimize dma_alloc_coherent for cache-coherent DMA masters Sascha Hauer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox