Concrete binary floatingpoint types, part 1
Introduction
The Swift standard library provides, depending on the platform, either two or
three concrete floatingpoint types. Again, these are all defined as value types
wrapping LLVM primitive types from the Builtin
module:
@frozen
public struct Float {
public // @testable
var _value: Builtin.FPIEEE32
/* ... */
}
Two floatingpoint types, Float
and Double
, are available on all platforms
supported by Swift. They are 32bit and 64bit types, respectively, and type
aliases are provided so that users can instead refer to these types
as Float32
and Float64
.
For the i386
and x86_64
architectures, the extendedprecision floatingpoint
type Float80
is also supported, except on Windows. On supported platforms, C’s
long double
data type is mapped to Float80
in Swift 4.2+, which
makes it possible to use the full set of math functions for Float80
that are
available on the platform. (In C/C++ programming with the Win32 API, the
long double
data type maps to double
.)
LLVM does support halfprecision (16bit) and quadrupleprecision (128bit) binary floatingpoint types, but this support is not surfaced by the Swift standard library. Some platforms offer limited native support for arithmetic using these formats.
A future version of Swift may include support for a 16bit IEEE 754 binary floatingpoint type, which would likely be named
Float16
.
IEEE 754
Swift, like many other languages, attempts to provide a floatingpoint implementation faithful to the IEEE 754 technical standard.
Background:
For floatingpoint types, IEEE 754 defines basic and interchange formats, rounding rules, required operations, and exception handling that are meant to enable reliability and portability.
A full overview of IEEE 754 is well beyond the scope of this article; some key aspects of the standard are as follows:
Data types are able to represent NaN (“not a number”), positive and negative infinity, and subnormal numbers that are very close to zero.
Addition, subtraction, multiplication, division, and square root are required operations that must be correctly rounded; that is, the result must be the representable value closest to the exact mathematical answer, rounded according to the chosen rounding mode.
There are five types of floatingpoint exceptions–invalid, division by zero, overflow, underflow, and inexact–which (controversially) are to be logged using global flags.
A large set of functions (such as sine and cosine) are recommended but not required.
Until recently, LLVM lacked constrained floatingpoint intrinsics to support the use of dynamic rounding modes or floatingpoint exception behavior. By default, the rounding mode is assumed to be roundtonearest and floatingpoint exceptions are ignored.
Swift does not expose any APIs to change the rounding mode or floatingpoint exception behavior, nor is it possible to interrogate floatingpoint status flags. (Such limitations are also found in Rust.)
Note that the rounding mode, or the rounding rule used to fit a result to the precision of a given floatingpoint format (IEEE 7542008 §4.3), is not necessarily the same as the rounding rule used to round a value to the nearest integer (IEEE 7542008 §5.9). In Swift, it is not possible to change the former, but it is possible to choose any rule for the latter.
Note that floatingpoint exceptions are to be distinguished from Swift errors and from runtime traps.
C mathematical functions
The Swift standard library provides an “overlay” that makes some changes to improve the user experience of working with C mathematical functions and disables certain incompatible functions.
IEEE 754 recommends, but does not require, implementations to provide elementary
functions such as sine, arctangent, and binary logarithm. The Swift standard
library will offer APIs for such operations in a future version of
Swift. For now, the Swift standard library provides only IEEE 754
required operations such as square root; for other functions, users need to use
the C standard library, which can be imported on macOS as part of the Darwin
module and on Linux as part of the Glibc
module (alternatively, users can
import the Foundation
module instead).
LLVM provides intrinsics that are equivalent to some C mathematical functions, including sine and cosine. The Swift overlay substitutes the LLVM intrinsic for the corresponding C library function where possible.
Note that not all functions are implemented with identical precision in Darwin
and Glibc
, and the same discrepancies among platforms are applicable to
functions provided by the Swift standard library.
A comparison of IEEE 754 required and recommended operations, their Swift
standard library names, and their C standard library overlay names is presented
below (where x
, y
, z
are values of floatingpoint type T
, n
is a value
of type Int
, and “Swift x” is a future version of Swift).
IEEE 754  Swift standard library  C standard library overlay 

Not shown: conversion and comparison operations Not available in Swift: conformance predicates and operations on subsets of flags 

Homogeneous general computational operations  
roundToIntegralTiesToEven(x)  x.rounded(.toNearestOrEven) 

roundToIntegralTiesToAway(x)  x.rounded() or x.rounded(.toNearestOrAwayFromZero) 
round(x) 
roundToIntegralTowardZero(x)  x.rounded(.towardZero) 
trunc(x) 
roundToIntegralTowardPositive(x)  x.rounded(.up) 
ceil(x) 
roundToIntegralTowardNegative(x)  x.rounded(.down) 
floor(x) 
roundToIntegralExact(x)  
nextUp(x)  x.nextUp 
Unavailable (Swift x):nextafter(x, .infinity) 
nextDown(x)  x.nextDown 
Unavailable (Swift x):nextafter(x, .infinity) 
remainder(x, y)  x.remainder(dividingBy: y) 
remainder(x, y) 
x.truncatingRemainder(dividingBy: y) 
fmod(x, y) 

minNum(x, y)  T.minimum(x, y) 
Unavailable (Swift x):fmin(x, y) 
maxNum(x, y)  T.maximum(x, y) 
Unavailable (Swift x):fmax(x, y) 
Swift.max(x  y, 0) 
fdim(x, y) 

minNumMag(x, y)  T.minimumMagnitude(x, y) 

maxNumMag(x, y)  T.maximumMagnitude(x, y) 

Scaling operations  
scaleB(x, n)  T(sign: .plus, exponent: n, significand: x) 
scalbn(x, n) 
logB(x)  x.exponent 
Unavailable (Swift 4.2):ilogb(x) 
Arithmetic operations (excluding conversion operations)  
addition(x, y)  x + y 

subtraction(x, y)  x  y 

multiplication(x, y)  x * y 

division(x, y)  x / y 

squareRoot(x)  x.squareRoot() or (Swift x) T.sqrt(x) 
sqrt(x) 
fusedMultiplyAdd(x, y, z)  z.addingProduct(x, y) 
fma(x, y, z) 
Sign bit operations  
copy(x)  x 

negate(x)  x 

abs(x)  abs(x) or x.magnitude 
Unavailable (Swift 4.2):fabs(x) 
copySign(x, y)  T(signOf: y, magnitudeOf: x) 
copysign(x, y) 
General noncomputational operations  
class(x)  x.floatingPointClass 
Unavailable:fpclassify(x) 
isSignMinus(x)  x.sign == .minus 
Unavailable:signbit(x) 
isNormal(x)  x.isNormal 
Unavailable:isnormal(x) 
isFinite(x)  x.isFinite 
Unavailable:isfinite(x) 
isZero(x)  x.isZero 

isSubnormal(x)  x.isSubnormal 

isInfinite(x)  x.isInfinite 
Unavailable:isinf(x) 
isNaN(x)  x.isNaN 
Unavailable:isnan(x) 
isSignaling(x)  x.isSignalingNaN 

isCanonical(x)  x.isCanonical 

radix(x)  T.radix 

totalOrder(x, y)  x.isTotallyOrdered(belowOrEqualTo: y) 

totalOrderMag(x, y)  
Additional elementary functions (Swift x)  
sin  T.sin(x) 
sin(x) 
cos  T.cos(x) 
cos(x) 
tan  T.tan(x) 
tan(x) 
sinPi  
cosPi  
asin  T.asin(x) 
asin(x) 
acos  T.acos(x) 
acos(x) 
atan  T.atan(x) 
atan(x) 
atanPi  
sinh  T.sinh(x) 
sinh(x) 
cosh  T.cosh(x) 
cosh(x) 
tanh  T.tanh(x) 
tanh(x) 
asinh  T.asinh(x) 
asinh(x) 
acosh  T.acosh(x) 
acosh(x) 
atanh  T.atanh(x) 
atanh(x) 
exp  T.exp(x) 
exp(x) 
exp2  T.exp2(x) 
exp2(x) 
exp10  T.exp10(x) or exp10(x) 

expm1  T.expm1(x) 
expm1(x) 
exp2m1  
exp10m1  
log  T.log(x) 
log(x) 
log2  T.log2(x) 
log2(x) 
T.log2(x).rounded(.down) 
Unavailable (Swift x):logb(x) 

log10  T.log10(x) 
log10(x) 
logp1  T.log1p(x) 
log1p(x) 
log2p1  
log10p1  
compound(x, n)  
pow(x, y)  T.pow(x, y) 
pow(x, y) 
powr(x, y)  
pown(x, n)  T.pow(x, n) or pow(x, n) 

rootn(x, n)  T.root(x, n) or root(x, n) 

T.root(x, 3) or root(x, 3) 
cbrt(x) 

rSqrt  
Additional real operations (Swift x)  
atan2(y, x)  T.atan2(y: y, x: x) or atan2(y: y, x: x) 
atan2(y, x) 
atan2Pi(y, x)  
hypot(x, y)  T.hypot(x, y) 
hypot(x, y) 
T.erf(x) 
erf(x) 

T.erfc(x) 
erfc(x) 

T.gamma(x) or gamma(x) 
tgamma(x) 

(T.logGamma(x), T.signGamma(x) == .plus ? 1 : 1) or (logGamma(x), signGamma(x) == .plus ? 1 : 1) 
lgamma(x) 
Current implementations of
T.pow(x, n)
andT.root(x, n)
give inaccurate results ifn
is so large that conversion toT
would round.
For more information on the additional elementary functions and real operations to be added in a future version of Swift, see the Swift Evolution proposal SE0246: Generic math(s) functions.
Finite constants
Similarly, some finite constants defined in the C standard library have equivalent static properties in the Swift standard library with clarified names.
Swift  C (float ) 
C (double ) 

greatestFiniteMagnitude 
FLT_MAX 
DBL_MAX 
leastNormalMagnitude 
FLT_MIN 
DBL_MIN 
leastNonzeroMagnitude 
FLT_TRUE_MIN 
DBL_TRUE_MIN 
pi 
M_PI 

ulpOfOne 
FLT_EPSILON 
DBL_EPSILON 
The use of “max” and “min” can be misleading. Even within the Swift project
itself, users have mistaken FLT_MIN
for the minimum representable
value (by analogy with Int.min
). However, FLT_MIN
is not even negative. Nor
is it the least representable positive value if the platform supports
subnormal values: in C, that value is known as FLT_TRUE_MIN
.
Note that .pi
is rounded toward zero for reasons discussed later.
Consequently, Float(M_PI) != .pi
.
The use of “epsilon” was avoided because that term has varying definitions among other programming languages and suggests that it might be appropriate for use as a measure of tolerance for floatingpoint comparisons, which is generally inadvisable.
For more information on the rationale for names chosen in Swift, see the Swift Evolution proposal SE0067: Enhanced floatingpoint protocols.
Previous:
Concrete integer types, part 2
Next:
Concrete binary floatingpoint types, part 2
27 February–3 March 2018
Updated 28 July 2019