Rule based number formatting by blagasz · Pull Request #1232 · python-babel/babel

blagasz · 2025-09-28T18:17:53Z

Number spelling based on the pure Python implementation of the CLDR RBNF engine.

Currently only supports whole numbers (cardinal, ordinal, year) as fractional numbers are not well-defined and not universally implemented in CLDR.

Based on the original PR #660 with useful enhancements from @akx at #682 and additional work from the original author:

extensive test cases are generated
dataclass added to make parsing context more readable in code
linting and debugging including the removal of recursion errors

Extensive test cases are added, but should be considered smoke tests until a native speaker reviews them for each language.

Based on an earlier discussion: #114 and referenced in #179

Supersedes #682

changes from original Babel PR python-babel#660 python-babel#660

Recursion bugs removed by improved rule parsing and routing, recursion exception now handled Context improved by moving to dataclass Compute divisor fixed by adding precision context Plural tokens now parsed Typos and improved comments

…cales, fix linting errors

codecov · 2025-09-28T19:09:57Z

Codecov Report

❌ Patch coverage is 97.88732% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.27%. Comparing base (50635d8) to head (bb56a15).
⚠️ Report is 20 commits behind head on master.

Files with missing lines	Patch %	Lines
babel/rbnf.py	97.83%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1232      +/-   ##
==========================================
+ Coverage   91.93%   92.27%   +0.33%     
==========================================
  Files          27       28       +1     
  Lines        4688     4972     +284     
==========================================
+ Hits         4310     4588     +278     
- Misses        378      384       +6

Flag	Coverage Δ
macos-14-3.10	`91.15% <94.01%> (+0.17%)`	⬆️
macos-14-3.11	`91.31% <97.88%> (+0.39%)`	⬆️
macos-14-3.12	`91.51% <97.88%> (+0.38%)`	⬆️
macos-14-3.13	`91.51% <97.88%> (+0.38%)`	⬆️
macos-14-3.8	`91.02% <94.01%> (+0.18%)`	⬆️
macos-14-3.9	`91.08% <94.01%> (+0.17%)`	⬆️
macos-14-pypy3.10	`91.15% <94.01%> (+0.17%)`	⬆️
ubuntu-24.04-3.10	`91.17% <94.01%> (+0.17%)`	⬆️
ubuntu-24.04-3.11	`91.33% <97.88%> (+0.39%)`	⬆️
ubuntu-24.04-3.12	`91.53% <97.88%> (+0.38%)`	⬆️
ubuntu-24.04-3.13	`91.53% <97.88%> (+0.38%)`	⬆️
ubuntu-24.04-3.8	`91.04% <94.01%> (+0.18%)`	⬆️
ubuntu-24.04-3.9	`91.10% <94.01%> (+0.17%)`	⬆️
ubuntu-24.04-pypy3.10	`91.17% <94.01%> (+0.17%)`	⬆️
windows-2022-3.10	`91.16% <94.01%> (+0.17%)`	⬆️
windows-2022-3.11	`91.32% <97.88%> (+0.39%)`	⬆️
windows-2022-3.12	`91.52% <97.88%> (+0.38%)`	⬆️
windows-2022-3.13	`91.52% <97.88%> (+0.38%)`	⬆️
windows-2022-3.8	`91.13% <94.01%> (+0.17%)`	⬆️
windows-2022-3.9	`91.09% <94.01%> (+0.17%)`	⬆️
windows-2022-pypy3.10	`91.16% <94.01%> (+0.17%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…eneration

akx

Thanks for reviving the number spelling challenge, and sorry it took me so long to get to reviewing this.

Beyond the comments within (mostly lack of type hints etc, but some other things too), please remove the non-human-audited (were there audited ones?) rbnf_test_cases? We don't want to add all of that to the repo here (especially if/when we don't know if they're correct). The script to generate them should naturally stay there (some comments for the script too).

I did take a quick look at the test cases for my native Finnish, and they look correct :)

akx · 2026-04-16T12:39:25Z

 SPACE_CHARS_RE = re.compile('|'.join(SPACE_CHARS))


+def spell_number(number, locale=LC_NUMERIC, ruleset=None):


Could you add type annotations here, please?

akx · 2026-04-16T12:40:29Z

+    return speller.format(number, ruleset=ruleset)
+
+
+def get_rbnf_rules(locale=LC_NUMERIC):


I'm not sure this needs to be a public API? (If it is, it also locks down the public API for RuleBasedNumberFormat.) Or... WDYT, what sort of app would consume the raw rules?

akx · 2026-04-16T12:42:04Z

        territory_languages[territory.attrib['type']] = languages
+
+
+    # To help the negotiation in `babel.numbers.spell_number`


spell_number doesn't seem to directly read this; can you fix up the comment?

akx · 2026-04-16T12:44:39Z

+        # there will be no rbnf rules for all locales
+        # there could be a separate iteration for rbnf rule files


I'm not sure I'm totally following this comment?

akx · 2026-04-16T12:45:57Z

+
+# Undocumented syntax (←%rule-name←←)
+# Trac ticket filed for CLDR update PL rbnf
+#     http://unicode.org/cldr/trac/ticket/10544


https://unicode-org.atlassian.net/browse/CLDR-10544 is saying this was resolved in https://unicode-org.atlassian.net/browse/CLDR-8909 and rules are flat now - can you check if these comments are still relevant?

akx · 2026-04-16T13:56:07Z

+test_cases = None
+
+def get_test_cases(
+        template_toml_path: str = None,
+        ruleset_name: str = None,
+    ) -> dict:
+
+    global test_cases
+
+    if test_cases is None:
+        if template_toml_path is None:
+            with open("tests/rbnf_test_cases/_template.toml", "rb") as f:
+                test_cases_temp = tomllib.load(f)
+                test_cases = test_cases_temp
+        else:
+            with open(template_toml_path, "rb") as f:
+                test_cases_temp = tomllib.load(f)
+    else:
+        test_cases_temp = test_cases
+
+    return test_cases_temp[get_mapped_ruleset_name(ruleset_name)].items()


Looks like this simplifies to something like
(import functools.cache and pathlib.Path)

Suggested change

test_cases = None

def get_test_cases(

template_toml_path: str = None,

ruleset_name: str = None,

) -> dict:

global test_cases

if test_cases is None:

if template_toml_path is None:

with open("tests/rbnf_test_cases/_template.toml", "rb") as f:

test_cases_temp = tomllib.load(f)

test_cases = test_cases_temp

else:

with open(template_toml_path, "rb") as f:

test_cases_temp = tomllib.load(f)

else:

test_cases_temp = test_cases

return test_cases_temp[get_mapped_ruleset_name(ruleset_name)].items()

@cache

def _get_test_cases_template(path: str = "tests/rbnf_test_cases/_template.toml"):

return tomllib.load(Path(path).read_text())

def get_test_cases(ruleset_name: str):

return _get_test_cases_template()[get_mapped_ruleset_name(ruleset_name)]

akx · 2026-04-16T13:56:24Z

+def get_mapped_ruleset_name(ruleset: str) -> str:
+    print(ruleset)
+    mapping = {
+        "spellout-numbering-year": "year",
+        "spellout-numbering": "numbering",
+        "spellout-ordinal": "ordinal",
+        "spellout-cardinal": "cardinal",
+    }
+    for k, v in mapping.items():
+        if ruleset.startswith(k):
+            return v
+    return 'numbering'  # default fallback


Suggested change

def get_mapped_ruleset_name(ruleset: str) -> str:

print(ruleset)

mapping = {

"spellout-numbering-year": "year",

"spellout-numbering": "numbering",

"spellout-ordinal": "ordinal",

"spellout-cardinal": "cardinal",

}

for k, v in mapping.items():

if ruleset.startswith(k):

return v

return 'numbering' # default fallback

mapping = {

"spellout-numbering-year": "year",

"spellout-numbering": "numbering",

"spellout-ordinal": "ordinal",

"spellout-cardinal": "cardinal",

}

def get_mapped_ruleset_name(ruleset: str) -> str:

for k, v in mapping.items():

if ruleset.startswith(k):

return v

return 'numbering' # default fallback

akx · 2026-04-16T13:56:48Z

+def generate_test_for_locale(
+        locale: Locale,
+        output_toml_path: str,
+        test_cases: dict = None,
+
+
+) -> None:


Suggested change

def generate_test_for_locale(

locale: Locale,

output_toml_path: str,

test_cases: dict = None,

) -> None:

def generate_test_for_locale(

locale: Locale,

output_toml_path: str,

) -> None:

akx · 2026-04-16T13:57:34Z

+            try:
+                v2 = speller.format(k, ruleset=ruleset)
+                print(f"    {k} : '{v2}'")
+                lines.append(f'{k} = "{v2}"')
+            except RBNFError as e:
+                print(k, locale, ruleset, e)
+                input()


Suggested change

try:

v2 = speller.format(k, ruleset=ruleset)

print(f" {k} : '{v2}'")

lines.append(f'{k} = "{v2}"')

except RBNFError as e:

print(k, locale, ruleset, e)

input()

v2 = speller.format(k, ruleset=ruleset)

lines.append(f'{k} = "{v2}"')

akx · 2026-04-16T13:58:45Z

+def generate_all_tests(
+        test_cases_toml_path: str = "tests/rbnf_test_cases/_template.toml",
+        output_dir: str = "tests/rbnf_test_cases/",
+) -> None:
+
+    for locale in list(get_global('rbnf_locales')):
+        output_toml_path = os.path.join(output_dir, f"{locale}.toml")
+        generate_test_for_locale(locale, output_toml_path)


Suggested change

def generate_all_tests(

test_cases_toml_path: str = "tests/rbnf_test_cases/_template.toml",

output_dir: str = "tests/rbnf_test_cases/",

) -> None:

for locale in list(get_global('rbnf_locales')):

output_toml_path = os.path.join(output_dir, f"{locale}.toml")

generate_test_for_locale(locale, output_toml_path)

def generate_all_tests() -> None:

for locale in get_global('rbnf_locales'):

generate_test_for_locale(locale, "tests/rbnf_test_cases/{locale}.toml")

blagasz and others added 11 commits September 11, 2025 08:24

initial commit for rework

79c20e7

changes from original Babel PR python-babel#660 python-babel#660

rbnf: light clean up

73c2b84

rbnf: correct radix reading

7f15c5d

rbnf: make spell_number API less kwargsy

78ed1b4

rbnf: store divisor and substitutions in Rule to avoid recomputation

a5c65a7

rbnf: eagerly evaluate self.rulesets to avoid alias lookup every time

50b87fc

rbnf: replace .format & friends with f-strings

1f27ca1

rbnf: correctly dump rulesets/rules to JSON file

802e734

Add smoke test for all RBNF-enabled locales and rulesets

9905dd5

add extensive smoke testing with generated TOML files for all RBNF lo…

7a56053

…cales, fix linting errors

blagasz mentioned this pull request Sep 28, 2025

Number spelling #682

Open

for python versions below 3.11 skip toml based rbnf tests

0cc8c13

blagasz force-pushed the master branch from baa3809 to 0cc8c13 Compare September 28, 2025 18:56

restore conftest

6896cf6

blagasz force-pushed the master branch from 51d0015 to 866f585 Compare September 28, 2025 21:16

add tests for rbnf engine

2105a53

blagasz force-pushed the master branch from 866f585 to 2105a53 Compare September 29, 2025 07:54

blagasz added 2 commits September 29, 2025 12:40

add a few negative number test cases and utility functions for test g…

3ff947c

…eneration

comment out yet unused fractional parsing code to increase coverage

bb56a15

akx self-requested a review October 8, 2025 09:45

akx added this to the Babel 2.19 milestone Oct 8, 2025

akx requested changes Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rule based number formatting#1232

Rule based number formatting#1232
blagasz wants to merge 16 commits intopython-babel:masterfrom
blagasz:master

blagasz commented Sep 28, 2025

Uh oh!

codecov bot commented Sep 28, 2025 •

edited

Loading

Uh oh!

akx left a comment

Uh oh!

akx Apr 16, 2026

Uh oh!

akx Apr 16, 2026

Uh oh!

akx Apr 16, 2026

Uh oh!

akx Apr 16, 2026

Uh oh!

akx Apr 16, 2026

Uh oh!

akx Apr 16, 2026

Uh oh!

akx Apr 16, 2026

Uh oh!

akx Apr 16, 2026

Uh oh!

akx Apr 16, 2026

Uh oh!

akx Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		SPACE_CHARS_RE = re.compile('\|'.join(SPACE_CHARS))


		def spell_number(number, locale=LC_NUMERIC, ruleset=None):

		return speller.format(number, ruleset=ruleset)


		def get_rbnf_rules(locale=LC_NUMERIC):

		territory_languages[territory.attrib['type']] = languages


		# To help the negotiation in `babel.numbers.spell_number`

		# there will be no rbnf rules for all locales
		# there could be a separate iteration for rbnf rule files

Conversation

blagasz commented Sep 28, 2025

Uh oh!

codecov bot commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

akx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Sep 28, 2025 •

edited

Loading