Skip to content

Rule based number formatting#1232

Open
blagasz wants to merge 16 commits intopython-babel:masterfrom
blagasz:master
Open

Rule based number formatting#1232
blagasz wants to merge 16 commits intopython-babel:masterfrom
blagasz:master

Conversation

@blagasz
Copy link
Copy Markdown
Member

@blagasz blagasz commented Sep 28, 2025

Number spelling based on the pure Python implementation of the CLDR RBNF engine.

Currently only supports whole numbers (cardinal, ordinal, year) as fractional numbers are not well-defined and not universally implemented in CLDR.

Based on the original PR #660 with useful enhancements from @akx at #682 and additional work from the original author:

  • extensive test cases are generated
  • dataclass added to make parsing context more readable in code
  • linting and debugging including the removal of recursion errors

Extensive test cases are added, but should be considered smoke tests until a native speaker reviews them for each language.

Based on an earlier discussion: #114 and referenced in #179

Supersedes #682

@blagasz blagasz mentioned this pull request Sep 28, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Sep 28, 2025

Codecov Report

❌ Patch coverage is 97.88732% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.27%. Comparing base (50635d8) to head (bb56a15).
⚠️ Report is 20 commits behind head on master.

Files with missing lines Patch % Lines
babel/rbnf.py 97.83% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1232      +/-   ##
==========================================
+ Coverage   91.93%   92.27%   +0.33%     
==========================================
  Files          27       28       +1     
  Lines        4688     4972     +284     
==========================================
+ Hits         4310     4588     +278     
- Misses        378      384       +6     
Flag Coverage Δ
macos-14-3.10 91.15% <94.01%> (+0.17%) ⬆️
macos-14-3.11 91.31% <97.88%> (+0.39%) ⬆️
macos-14-3.12 91.51% <97.88%> (+0.38%) ⬆️
macos-14-3.13 91.51% <97.88%> (+0.38%) ⬆️
macos-14-3.8 91.02% <94.01%> (+0.18%) ⬆️
macos-14-3.9 91.08% <94.01%> (+0.17%) ⬆️
macos-14-pypy3.10 91.15% <94.01%> (+0.17%) ⬆️
ubuntu-24.04-3.10 91.17% <94.01%> (+0.17%) ⬆️
ubuntu-24.04-3.11 91.33% <97.88%> (+0.39%) ⬆️
ubuntu-24.04-3.12 91.53% <97.88%> (+0.38%) ⬆️
ubuntu-24.04-3.13 91.53% <97.88%> (+0.38%) ⬆️
ubuntu-24.04-3.8 91.04% <94.01%> (+0.18%) ⬆️
ubuntu-24.04-3.9 91.10% <94.01%> (+0.17%) ⬆️
ubuntu-24.04-pypy3.10 91.17% <94.01%> (+0.17%) ⬆️
windows-2022-3.10 91.16% <94.01%> (+0.17%) ⬆️
windows-2022-3.11 91.32% <97.88%> (+0.39%) ⬆️
windows-2022-3.12 91.52% <97.88%> (+0.38%) ⬆️
windows-2022-3.13 91.52% <97.88%> (+0.38%) ⬆️
windows-2022-3.8 91.13% <94.01%> (+0.17%) ⬆️
windows-2022-3.9 91.09% <94.01%> (+0.17%) ⬆️
windows-2022-pypy3.10 91.16% <94.01%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@akx akx self-requested a review October 8, 2025 09:45
@akx akx added this to the Babel 2.19 milestone Oct 8, 2025
Copy link
Copy Markdown
Member

@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviving the number spelling challenge, and sorry it took me so long to get to reviewing this.

Beyond the comments within (mostly lack of type hints etc, but some other things too), please remove the non-human-audited (were there audited ones?) rbnf_test_cases? We don't want to add all of that to the repo here (especially if/when we don't know if they're correct). The script to generate them should naturally stay there (some comments for the script too).

I did take a quick look at the test cases for my native Finnish, and they look correct :)

Comment thread babel/numbers.py
SPACE_CHARS_RE = re.compile('|'.join(SPACE_CHARS))


def spell_number(number, locale=LC_NUMERIC, ruleset=None):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add type annotations here, please?

Comment thread babel/numbers.py
return speller.format(number, ruleset=ruleset)


def get_rbnf_rules(locale=LC_NUMERIC):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this needs to be a public API? (If it is, it also locks down the public API for RuleBasedNumberFormat.) Or... WDYT, what sort of app would consume the raw rules?

Comment thread scripts/import_cldr.py
territory_languages[territory.attrib['type']] = languages


# To help the negotiation in `babel.numbers.spell_number`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spell_number doesn't seem to directly read this; can you fix up the comment?

Comment thread scripts/import_cldr.py
Comment on lines +470 to +471
# there will be no rbnf rules for all locales
# there could be a separate iteration for rbnf rule files
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm totally following this comment?

Comment thread babel/rbnf.py

# Undocumented syntax (←%rule-name←←)
# Trac ticket filed for CLDR update PL rbnf
# http://unicode.org/cldr/trac/ticket/10544
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://unicode-org.atlassian.net/browse/CLDR-10544 is saying this was resolved in https://unicode-org.atlassian.net/browse/CLDR-8909 and rules are flat now - can you check if these comments are still relevant?

Comment on lines +11 to +31
test_cases = None

def get_test_cases(
template_toml_path: str = None,
ruleset_name: str = None,
) -> dict:

global test_cases

if test_cases is None:
if template_toml_path is None:
with open("tests/rbnf_test_cases/_template.toml", "rb") as f:
test_cases_temp = tomllib.load(f)
test_cases = test_cases_temp
else:
with open(template_toml_path, "rb") as f:
test_cases_temp = tomllib.load(f)
else:
test_cases_temp = test_cases

return test_cases_temp[get_mapped_ruleset_name(ruleset_name)].items()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this simplifies to something like
(import functools.cache and pathlib.Path)

Suggested change
test_cases = None
def get_test_cases(
template_toml_path: str = None,
ruleset_name: str = None,
) -> dict:
global test_cases
if test_cases is None:
if template_toml_path is None:
with open("tests/rbnf_test_cases/_template.toml", "rb") as f:
test_cases_temp = tomllib.load(f)
test_cases = test_cases_temp
else:
with open(template_toml_path, "rb") as f:
test_cases_temp = tomllib.load(f)
else:
test_cases_temp = test_cases
return test_cases_temp[get_mapped_ruleset_name(ruleset_name)].items()
@cache
def _get_test_cases_template(path: str = "tests/rbnf_test_cases/_template.toml"):
return tomllib.load(Path(path).read_text())
def get_test_cases(ruleset_name: str):
return _get_test_cases_template()[get_mapped_ruleset_name(ruleset_name)]

Comment on lines +34 to +45
def get_mapped_ruleset_name(ruleset: str) -> str:
print(ruleset)
mapping = {
"spellout-numbering-year": "year",
"spellout-numbering": "numbering",
"spellout-ordinal": "ordinal",
"spellout-cardinal": "cardinal",
}
for k, v in mapping.items():
if ruleset.startswith(k):
return v
return 'numbering' # default fallback
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_mapped_ruleset_name(ruleset: str) -> str:
print(ruleset)
mapping = {
"spellout-numbering-year": "year",
"spellout-numbering": "numbering",
"spellout-ordinal": "ordinal",
"spellout-cardinal": "cardinal",
}
for k, v in mapping.items():
if ruleset.startswith(k):
return v
return 'numbering' # default fallback
mapping = {
"spellout-numbering-year": "year",
"spellout-numbering": "numbering",
"spellout-ordinal": "ordinal",
"spellout-cardinal": "cardinal",
}
def get_mapped_ruleset_name(ruleset: str) -> str:
for k, v in mapping.items():
if ruleset.startswith(k):
return v
return 'numbering' # default fallback

Comment on lines +48 to +54
def generate_test_for_locale(
locale: Locale,
output_toml_path: str,
test_cases: dict = None,


) -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def generate_test_for_locale(
locale: Locale,
output_toml_path: str,
test_cases: dict = None,
) -> None:
def generate_test_for_locale(
locale: Locale,
output_toml_path: str,
) -> None:

Comment on lines +74 to +80
try:
v2 = speller.format(k, ruleset=ruleset)
print(f" {k} : '{v2}'")
lines.append(f'{k} = "{v2}"')
except RBNFError as e:
print(k, locale, ruleset, e)
input()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
v2 = speller.format(k, ruleset=ruleset)
print(f" {k} : '{v2}'")
lines.append(f'{k} = "{v2}"')
except RBNFError as e:
print(k, locale, ruleset, e)
input()
v2 = speller.format(k, ruleset=ruleset)
lines.append(f'{k} = "{v2}"')

Comment on lines +88 to +95
def generate_all_tests(
test_cases_toml_path: str = "tests/rbnf_test_cases/_template.toml",
output_dir: str = "tests/rbnf_test_cases/",
) -> None:

for locale in list(get_global('rbnf_locales')):
output_toml_path = os.path.join(output_dir, f"{locale}.toml")
generate_test_for_locale(locale, output_toml_path)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def generate_all_tests(
test_cases_toml_path: str = "tests/rbnf_test_cases/_template.toml",
output_dir: str = "tests/rbnf_test_cases/",
) -> None:
for locale in list(get_global('rbnf_locales')):
output_toml_path = os.path.join(output_dir, f"{locale}.toml")
generate_test_for_locale(locale, output_toml_path)
def generate_all_tests() -> None:
for locale in get_global('rbnf_locales'):
generate_test_for_locale(locale, "tests/rbnf_test_cases/{locale}.toml")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants