mail archive of the barebox mailing list
 help / color / mirror / Atom feed
From: "Enrico Jörns" <ejo@pengutronix.de>
To: barebox@lists.infradead.org
Cc: ejo@pengutronix.de
Subject: [PATCH] docs: conf.py: tweak SearchEnglish to be hyphen- and dot-friendly
Date: Tue, 27 May 2025 09:52:05 +0200	[thread overview]
Message-ID: <20250527075205.2915063-1-ejo@pengutronix.de> (raw)

This modifies the default indexer split() and js splitQuery()
methods to support searching for words with 'inner' hyphens or dots.

While this might not be an ideal, rock solid, and fully future-proof
solution, since it relies on some upstream sphinx-docs methods to exist,
it allows to search for strings including hyphens and dots, such as
'OP-TEE', 'nv.bootchooser.last_chosen', or 'barebox-state'.

Below is a bit more detailed explanation of the two modifications done:

1) The default split regex in the sphinx-doc SearchLanguage base class
   is:

   | _word_re = re.compile(r'\w+')

   which we extend to include words with inner hyphens '-' and dots '.':

   | _word_re = re.compile(r'\w+(?:[\.\-]\w+)*')

   This will result in a searchindex.js that contains words with hyphens
   and dots.

2) The 'searchtool.js' code notes for its splitQuery() implementation:

   | /**
   |  * Default splitQuery function. Can be overridden in ``sphinx.search`` with a
   |  * custom function per language.
   |  *
   |  * The regular expression works by splitting the string on consecutive characters
   |  * that are not Unicode letters, numbers, underscores, or emoji characters.
   |  * This is the same as ``\W+`` in Python, preserving the surrogate pair area.
   |  */
   | if (typeof splitQuery === "undefined") {
   |   var splitQuery = (query) => query
   |       .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu)
   |       .filter(term => term)  // remove remaining empty strings
   | }

   The hook for this is documented in the sphinx-docs 'SearchLanguage'
   base class.

   |    .. attribute:: js_splitter_code
   |
   |       Return splitter function of JavaScript version.  The function should be
   |       named as ``splitQuery``.  And it should take a string and return list of
   |       strings.
   |
   |       .. versionadded:: 3.0

   We use this to define a simplified splitQuery() function with a split
   argument that splits on empty spaces only.

We extend SearchEnglish (which extends SearchLanguage) here to retain
the stemmer code and stopwords for English.

Signed-off-by: Enrico Jörns <ejo@pengutronix.de>
---
 Documentation/conf.py | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/conf.py b/Documentation/conf.py
index 5fb8b07c38..01c430dfa6 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -14,6 +14,7 @@
 
 import sys
 import os
+import re
 
 # If extensions (or modules to document with autodoc) are in another directory,
 # add these directories to sys.path here. If the directory is relative to the
@@ -260,3 +261,20 @@ texinfo_documents = [
 #texinfo_no_detailmenu = False
 
 highlight_language = 'none'
+
+from sphinx.search import SearchEnglish
+from sphinx.search import languages
+class DashFriendlySearchEnglish(SearchEnglish):
+
+    # Accept words that can include 'inner' hyphens or dots
+    _word_re = re.compile(r'[\w]+(?:[\.\-][\w]+)*')
+
+    js_splitter_code = """
+function splitQuery(query) {
+    return query
+        .split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}\-\.]+/gu)
+        .filter(term => term.length > 0);
+}
+"""
+
+languages['en'] = DashFriendlySearchEnglish
-- 
2.39.5




                 reply	other threads:[~2025-05-27  7:53 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250527075205.2915063-1-ejo@pengutronix.de \
    --to=ejo@pengutronix.de \
    --cc=barebox@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox