Home / Computer science / Floating-point number (FP) – Generalize the base + small binary/decimal notes
Precision (TODO ranges)
| Precision | Bits | Identifiers | Note |
|---|---|---|---|
| Half | 16 = 1 + 5 + 10 | FP16, float16, | IEEE 754 binary16 |
| Single | 32 = 1 + 8 + 23 | FP32, float32, float |
IEEE 754 binary32 |
| Double | 64 = 1 + 11 + 52 | FP64, float64, double |
IEEE 754 binary64 |
| Brain | 16 = 1 + 8 + 7 | BF16, bfloat16 | used in ANNs |
where bits are: sign, significand, and exponent (excluding the hidden bit).