Johannes Müller 08 May 2019

Formatting pretty numbers for humans

With Crystal 0.28.0 we have a new feature for formatting numbers for human readers.

Previously the options were using #to_s on various Number types or at best sprintf. Both provide only limited output formats and they’re focused on how numbers are represented for computers. They don’t have readability for humans in mind.

When showing numbers in a user interface, they need to be understandable by human readers.

Format a Number

Meet the new Number#format method.

It allows printing numbers in a customizable format, that can represent the way that numbers are usually written for humans.

Number styles

Numbers can be formatted using configurable decimal separator and thousands delimiter:

123_456.789.format('.', ',')   # => "123,456.789"
123_456.789.format(',', '.')   # => "123.456,789"
123_456.789.format(',', ' ')   # => "123 456,789"
123_456.789.format(',', '\'')  # => "123'456,789"

The number of digits in a thousands group is also configurable. This works for example for Chinese numbers grouped by tenthousands:

123_456.789.format('.', ',', group: 4) # => "12,3456.789"

There are many different styles used in different cultural contexts, and this method is flexible enough to represent most common formats.

How the world separates its digits provides an overview of international styles, and the Wikipedia article on Decimal Separators provides some more insight on this topic.

Decimal places

Floating point numbers can produce a lot of decimal places when converted to a human-readable string. For user output such detail is usually a distraction and displaying a few decimal places is plenty.

The number of decimal places can be configured directly in the #format method:

123_456.789.format(decimal_places: 2) # => "123,456.79"
123_456.789.format(decimal_places: 0) # => "123,457"
123_456.789.format(decimal_places: 4) # => "123,456.7890"

Compared to rounding the value manually before formatting it, this is easier and allows for more options.

The number of decimal places is fixed by default. Trailing zeros will only be omitted when only_significant is true:

123_456.789.format(decimal_places: 6)                         # => "123,456.789000"
123_456.789.format(decimal_places: 6, only_significant: true) # => "123,456.789"

Humanize a Number

When numbers of different orders of magnitude are put in relation, it’s difficult to represent a large range of values in a meaningful way.

In such cases, it’s common to express the magnitude of a value using a quantifier.

For this we have Number#humanize: It rounds the number to the nearest thousands magnitude with a specific number of significant digits.

1_200_000_000.humanize # => "1.2G"
0.000_000_012.humanize # => "12.0n"

It has the same arguments for decimal separator and thousands delimiter as Number#format, so the style is configurable exactly the same way.

The number of significant digits can be adjusted by precision. But the default value 3 is probably already a good fit for most applications. When siginficant is true, the value of precision is the fixed amount of decimal digits regardless of the number’s value.

Quantifiers are by default the SI prefixes (k, M, G, etc.), but they’re completely configurable, either by providing a list, or a proc.

Customizable quantifiers

Number#humanize can take a proc argument that calculates the number of digits and the quantifier for a specific magnitude.

The following example shows how to format a length in metric units, including the unit designator. It derives from the default implementation by using the common centimeter unit for values between 0.01 and 0.99 (which the generic mapping would express as millimeter). All other values use the generic SI prefixes (provided by Number.si_prefix).

def humanize_length(number)
  number.humanize do |magnitude, number|
    case magnitude
    when -2, -1 then {-2, " cm"}
    else
      magnitude = Number.prefix_index(magnitude)
      {magnitude, " #{Number.si_prefix(magnitude)}m"}
    end
  end
end

humanize_length(1_420) # => "1.42 km"
humanize_length(0.23)  # => "23.0 cm"
humanize_length(0.05)  # => "5.0 cm"
humanize_length(0.001) # => "1.0 mm"

Humanize Bytes

The third method is Int#humanize_bytes which allows formatting a number of bytes (for example memory size) in a typical format. It supports both IEC (Ki, Mi, Gi, Ti, Pi, Ei, Zi, Yi) and JEDEC (K, M, G, T, P, E, Z, Y) prefixes.

1.humanize_bytes                          # => "1B"
1024.humanize_bytes                       # => "1.0kiB"
1536.humanize_bytes                       # => "1.5kiB"
524288.humanize_bytes(format: :JEDEC)     # => "512kB"
1073741824.humanize_bytes(format: :JEDEC) # => "1.0GB"

The implementation of this method is another example for a custom format based on Numer#humanize.

Summary

These new methods provide great features for making numbers look pretty to the reader.

They do not provide style mappings for specific locales. This is a non-trivial task that should be left for dedicated I18N libraries. But they’re useful building blocks that such libraries can build upon. And they’re immediatetly usable when you don’t need to support different locales.

The implementation is not perfect, though. Localization is complex and hard to get right. As always, the devil lies in the details. For example, the thousands delimiter and group size are configurable, but have fixed values. The Indian numbering system can’t be represented in this way. Then only arabic numbers are supported. And there are probably lots of other cases which would require more specialiced behaviour.

But it’s probably good for more than 90% of typical use cases, and already useful in many places. And there is always room for improvement.

More background information can be found in the PR which brought these features.

Also a good read on formatting numbers from a more general perspective: Formatting numbers for machines and mortals by Hjalmar Gislason.

With Crystal 0.28.0 we have a new feature for formatting numbers for human readers.

Previously the options were using #to_s on various Number types or at best sprintf. Both provide only limited output formats and they’re focused on how numbers are represented for computers. They don’t have readability for humans in mind.

When showing numbers in a user interface, they need to be understandable by human readers.

Format a Number

Meet the new Number#format method.

It allows printing numbers in a customizable format, that can represent the way that numbers are usually written for humans.

Number styles

Numbers can be formatted using configurable decimal separator and thousands delimiter:

123_456.789.format('.', ',')   # => "123,456.789"
123_456.789.format(',', '.')   # => "123.456,789"
123_456.789.format(',', ' ')   # => "123 456,789"
123_456.789.format(',', '\'')  # => "123'456,789"

The number of digits in a thousands group is also configurable. This works for example for Chinese numbers grouped by tenthousands:

123_456.789.format('.', ',', group: 4) # => "12,3456.789"

There are many different styles used in different cultural contexts, and this method is flexible enough to represent most common formats.

How the world separates its digits provides an overview of international styles, and the Wikipedia article on Decimal Separators provides some more insight on this topic.

Decimal places

Floating point numbers can produce a lot of decimal places when converted to a human-readable string. For user output such detail is usually a distraction and displaying a few decimal places is plenty.

The number of decimal places can be configured directly in the #format method:

123_456.789.format(decimal_places: 2) # => "123,456.79"
123_456.789.format(decimal_places: 0) # => "123,457"
123_456.789.format(decimal_places: 4) # => "123,456.7890"

Compared to rounding the value manually before formatting it, this is easier and allows for more options.

The number of decimal places is fixed by default. Trailing zeros will only be omitted when only_significant is true:

123_456.789.format(decimal_places: 6)                         # => "123,456.789000"
123_456.789.format(decimal_places: 6, only_significant: true) # => "123,456.789"

Humanize a Number

When numbers of different orders of magnitude are put in relation, it’s difficult to represent a large range of values in a meaningful way.

In such cases, it’s common to express the magnitude of a value using a quantifier.

For this we have Number#humanize: It rounds the number to the nearest thousands magnitude with a specific number of significant digits.

1_200_000_000.humanize # => "1.2G"
0.000_000_012.humanize # => "12.0n"

It has the same arguments for decimal separator and thousands delimiter as Number#format, so the style is configurable exactly the same way.

The number of significant digits can be adjusted by precision. But the default value 3 is probably already a good fit for most applications. When siginficant is true, the value of precision is the fixed amount of decimal digits regardless of the number’s value.

Quantifiers are by default the SI prefixes (k, M, G, etc.), but they’re completely configurable, either by providing a list, or a proc.

Customizable quantifiers

Number#humanize can take a proc argument that calculates the number of digits and the quantifier for a specific magnitude.

The following example shows how to format a length in metric units, including the unit designator. It derives from the default implementation by using the common centimeter unit for values between 0.01 and 0.99 (which the generic mapping would express as millimeter). All other values use the generic SI prefixes (provided by Number.si_prefix).

def humanize_length(number)
  number.humanize do |magnitude, number|
    case magnitude
    when -2, -1 then {-2, " cm"}
    else
      magnitude = Number.prefix_index(magnitude)
      {magnitude, " #{Number.si_prefix(magnitude)}m"}
    end
  end
end

humanize_length(1_420) # => "1.42 km"
humanize_length(0.23)  # => "23.0 cm"
humanize_length(0.05)  # => "5.0 cm"
humanize_length(0.001) # => "1.0 mm"

Humanize Bytes

The third method is Int#humanize_bytes which allows formatting a number of bytes (for example memory size) in a typical format. It supports both IEC (Ki, Mi, Gi, Ti, Pi, Ei, Zi, Yi) and JEDEC (K, M, G, T, P, E, Z, Y) prefixes.

1.humanize_bytes                          # => "1B"
1024.humanize_bytes                       # => "1.0kiB"
1536.humanize_bytes                       # => "1.5kiB"
524288.humanize_bytes(format: :JEDEC)     # => "512kB"
1073741824.humanize_bytes(format: :JEDEC) # => "1.0GB"

The implementation of this method is another example for a custom format based on Numer#humanize.

Summary

These new methods provide great features for making numbers look pretty to the reader.

They do not provide style mappings for specific locales. This is a non-trivial task that should be left for dedicated I18N libraries. But they’re useful building blocks that such libraries can build upon. And they’re immediatetly usable when you don’t need to support different locales.

The implementation is not perfect, though. Localization is complex and hard to get right. As always, the devil lies in the details. For example, the thousands delimiter and group size are configurable, but have fixed values. The Indian numbering system can’t be represented in this way. Then only arabic numbers are supported. And there are probably lots of other cases which would require more specialiced behaviour.

But it’s probably good for more than 90% of typical use cases, and already useful in many places. And there is always room for improvement.

More background information can be found in the PR which brought these features.

Also a good read on formatting numbers from a more general perspective: Formatting numbers for machines and mortals by Hjalmar Gislason.