Skip to content

Latest commit

 

History

History
2305 lines (1813 loc) · 93.3 KB

ch07.asciidoc

File metadata and controls

2305 lines (1813 loc) · 93.3 KB

Built-in Improvements in ES6

Thus far in the book, we’ve discussed entirely new language syntax, such as property value shorthands, arrow functions, destructuring, or generators; and entirely new built-ins, such as WeakMap, Proxy, or Symbol. This chapter, on the other hand, is mostly devoted to existing built-ins that were improved in ES6. These improvements consist mostly of new instance methods, properties, and utility methods.

Numbers

ES6 introduces numeric literal representations for binary and octal numbers.

Binary and Octal Literals

Before ES6, your best bet when it comes to binary representation of integers was to just pass them to parseInt with a radix of 2.

parseInt('101', 2)
// <- 5

You can now use the new 0b prefix to represent binary integer literals. You could also use the 0B prefix, with a capital B. The two notations are equivalent.

console.log(0b000) // <- 0
console.log(0b001) // <- 1
console.log(0b010) // <- 2
console.log(0b011) // <- 3
console.log(0b100) // <- 4
console.log(0b101) // <- 5
console.log(0b110) // <- 6
console.log(0b111) // <- 7

In ES3, parseInt interpreted strings of digits starting with a 0 as an octal value. That meant things got weird quickly when you forgot to specify a radix of 10. As a result, specifying the radix of 10 became a best practice, so that user input like 012 wouldn’t unexpectedly be parsed as the integer 10.

console.log(parseInt('01'))
// <- 1
console.log(parseInt('012'))
// <- 10
console.log(parseInt('012', 10))
// <- 12

When ES5 came around, the default radix in parseInt changed, from 8 to 10. It was still recommended that you specified a radix for backward compatibility purposes. If you wanted to parse strings as octal values, you could explicitly pass in a radix of 8 as the second argument.

console.log(parseInt('100', 8))
// <- 64

You can now use the 0o prefix for octal literals, which are new in ES6. You could also use 0O, which is equivalent. Having a 0 followed by an uppercase O may be hard to distinguish in some typefaces, which is why it is suggested that you stick with the lowercase 0o notation.

console.log(0o001) // <- 1
console.log(0o010) // <- 8
console.log(0o100) // <- 64

You might be used to hexadecimal literals present in other languages, commonly prefixed with 0x. Those were already introduced to the JavaScript language in ES5. The prefix for literal hexadecimal notation is either 0x, or 0X, as shown in the following code snippet.

console.log(0x0ff) // <- 255
console.log(0xf00) // <- 3840

Besides these minor syntax changes where octal and binary literals were introduced, a few methods were added to Number in ES6. The first four Number methods that we’ll be discussing—Number.isNaN, Number.isFinite, Number.parseInt, and Number.parseFloat—already existed as functions in the global namespace. In addition, the methods in Number are slightly different in that they don’t coerce nonnumeric values into numbers before producing a result.

Number.isNaN

This method is almost identical to the global isNaN method. Number.isNaN returns whether the provided value is NaN, whereas isNaN returns whether value is not a number. These two questions have slightly different answers.

The next snippet quickly shows that, when passed to Number.isNaN, anything that’s not NaN will return false, while NaN will produce true. Note how in the last case we’re already passing NaN to Number.isNaN, as that’s the result of dividing two strings.

Number.isNaN(123)
// <- false, integers are not NaN
Number.isNaN(Infinity)
// <- false, Infinity is not NaN
Number.isNaN('a hundred')
// <- false, 'a hundred' is not NaN
Number.isNaN(NaN)
// <- true, NaN is NaN
Number.isNaN('a hundred' / 'two')
// <- true, 'a hundred' / 'two' is NaN, NaN is NaN

The isNaN method, in contrast, casts nonnumeric values passed to it before evaluating them against NaN. This results in significantly different return values. In the following example, each alternative produces different results because isNaN, unlike Number.isNaN, casts the value passed to it through Number first.

isNaN('a hundred')
// <- true, because Number('a hundred') is NaN
isNaN(new Date())
// <- false, because Number(new Date()) uses Date#valueOf,
//    which returns a unix timestamp

Number.isNaN is more precise than its global counterpart, because it doesn’t involve casting. There are still a few reasons why Number.isNaN can be a source of confusion.

First off, isNaN casts input through Number(value) before comparison, while Number.isNaN doesn’t. Neither Number.isNaN nor isNaN answer the "is this not a number?" question, but instead they answer whether value—or Number(value)—is NaN.

In most cases, what you actually want to know is whether a value identifies as a number—typeof NaN === 'number'—and is a number. The isNumber function in the following code snippet does just that. Note that it’d work with both isNaN and Number.isNaN due to type checking. Everything that reports a typeof value of 'number' is a number, except for NaN, so we filter out those out as false positive results.

function isNumber(value) {
  return typeof value === 'number' && !Number.isNaN(value)
}

You can use that method to figure out whether a value is a number or not. In the next snippet there are a few examples of how isNumber works.

isNumber(1)
// <- true
isNumber(Infinity)
// <- true
isNumber(NaN)
// <- false
isNumber('two')
// <- false
isNumber(new Date())
// <- false

There is a function, which was already in the language, that somewhat resembles our custom isNumber function: isFinite.

Number.isFinite

The rarely promoted isFinite method has been available since ES3. It returns a Boolean value indicating whether the provided value matches none of Infinity, -Infinity, and NaN.

The isFinite method coerces values through Number(value), while Number.isFinite doesn’t. This means that values that can be coerced into non-NaN numbers will be considered finite numbers by isNumber—even though they aren’t explicit numbers.

Here are a few examples using the global isFinite function.

isFinite(NaN)
// <- false
isFinite(Infinity)
// <- false
isFinite(-Infinity)
// <- false
isFinite(null)
// <- true, because Number(null) is 0
isFinite(-13)
// <- true, because Number(-13) is -13
isFinite('10')
// <- true, because Number('10') is 10

Using Number.isFinite is a safer bet, as it doesn’t incur in unexpected casting. You could always use Number.isFinite(Number(value)) if you did want the value to be cast into its numeric representation. Separating the two aspects, casting versus computing, results in more explicit code.

Here are a few examples using the Number.isFinite method.

Number.isFinite(NaN)
// <- false
Number.isFinite(Infinity)
// <- false
Number.isFinite(-Infinity)
// <- false
Number.isFinite(null)
// <- false, because null is not a number
Number.isFinite(-13)
// <- true
Number.isFinite('10')
// <- false, because '10' is not a number

Creating a ponyfill for Number.isFinite would involve returning false for nonnumeric values, effectively turning off the type-casting feature, and then calling isFinite on the input value.

function numberIsFinite(value) {
  return typeof value === 'number' && isFinite(value)
}

Number.parseInt

The Number.parseInt method works the same as parseInt. It is, in fact, the same.

console.log(Number.parseInt === parseInt)
// <- true

The parseInt function has support for hexadecimal literal notation in strings. Specifying the radix is not even necessary: based on the 0x prefix, parseInt infers that the number must be base 16.

parseInt('0xf00')
// <- 3840
parseInt('0xf00', 16)
// <- 3840

If you provided another radix, parseInt would bail after the first nondigit character.

parseInt('0xf00', 10)
// <- 0
parseInt('5xf00', 10)
// <- 5, illustrating there's no special treatment here

While parseInt accepts input in hexadecimal literal notation strings, its interface hasn’t changed in ES6. Therefore, binary and octal literal notation strings won’t be interpreted as such. This introduces a new inconsistency in ES6, where parseInt understands 0x, but not 0b nor 0o.

parseInt('0b011')
// <- 0
parseInt('0b011', 2)
// <- 0
parseInt('0o100')
// <- 0
parseInt('0o100', 8)
// <- 0

It’s up to you to drop the prefix before parseInt, if you wanted to use parseInt to read these literals. You’ll also need to specify the corresponding radix of 2 for binary numbers or 8 for octals.

parseInt('0b011'.slice(2), 2)
// <- 3
parseInt('0o110'.slice(2), 8)
// <- 72

In contrast, the Number function is perfectly able to cast these strings into the correct numbers.

Number('0b011')
// <- 3
Number('0o110')
// <- 72

Number.parseFloat

Like parseInt, parseFloat was added to Number without any modifications whatsoever.

console.log(Number.parseFloat === parseFloat)
// <- true

Luckily, parseFloat didn’t have any special behavior with regard to hexadecimal literal strings, meaning that Number.parseFloat is unlikely to introduce any confusion.

The parseFloat function was added to Number for completeness. In future versions of the language, there will be less global namespace pollution. When a function serves a specific purpose, it’ll be added to the relevant built-in, rather than as a global.

Number.isInteger

This is a new method coming in ES6, and it wasn’t previously available as a global function. The isInteger method returns true if the provided value is a finite number that doesn’t have a decimal part.

console.log(Number.isInteger(Infinity)) // <- false
console.log(Number.isInteger(-Infinity)) // <- false
console.log(Number.isInteger(NaN)) // <- false
console.log(Number.isInteger(null)) // <- false
console.log(Number.isInteger(0)) // <- true
console.log(Number.isInteger(-10)) // <- true
console.log(Number.isInteger(10.3)) // <- false

You might want to consider the following code snippet as a ponyfill for Number.isInteger. The modulus operator returns the remainder of dividing the same operands. If we divide by one, we’re effectively getting the decimal part. If that’s 0, then it means the number is an integer.

function numberIsInteger(value) {
  return Number.isFinite(value) && value % 1 === 0
}

Next up we’ll dive into floating-point arithmetic, which is well-documented as having interesting corner cases.

Number.EPSILON

The EPSILON property is a new constant value being added to the Number built-in. The following snippet shows its value.

Number.EPSILON
// <- 2.220446049250313e-16
Number.EPSILON.toFixed(20)
// <- '0.00000000000000022204'

Let’s take a look at the canonical example of floating-point arithmetic.

0.1 + 0.2
// <- 0.30000000000000004
0.1 + 0.2 === 0.3
// <- false

What’s the margin of error in this operation? Let’s move the operands around and find out.

0.1 + 0.2 - 0.3
// <- 5.551115123125783e-17
5.551115123125783e-17.toFixed(20)
// <- '0.00000000000000005551'

We could use Number.EPSILON to figure out whether the difference is small enough to be negligible; Number.EPSILON denotes a safe margin of error for floating-point arithmetic rounding operations.

5.551115123125783e-17 < Number.EPSILON
// <- true

The following piece of code can be used to figure out whether the result of a floating-point operation is within the expected margin of error. We use Math.abs, because that way the order of left and right won’t matter. In other words, withinMarginOfError(left, right) will produce the same result as withinMarginOfError(right, left).

function withinMarginOfError(left, right) {
  return Math.abs(left - right) < Number.EPSILON
}

The next snippet shows withinMarginOfError in action.

withinMarginOfError(0.1 + 0.2, 0.3)
// <- true
withinMarginOfError(0.2 + 0.2, 0.3)
// <- false

Using floating-point representation, not every integer can be represented precisely.

Number.MAX_SAFE_INTEGER and Number.MIN_SAFE_INTEGER

This is the largest integer that can be safely and precisely represented in JavaScript, or any language that represents integers using floating point as specified by the IEEE-754 standard,IEEE 754 is the Floating Point Standard. for that matter. The next bit of code shows exactly how large Number.MAX_SAFE_INTEGER is.

Number.MAX_SAFE_INTEGER === Math.pow(2, 53) - 1
// <- true
Number.MAX_SAFE_INTEGER === 9007199254740991
// <- true

As you might expect, there’s also the opposite constant: the minimum. It’s the negative value of Number.MAX_SAFE_INTEGER.

Number.MIN_SAFE_INTEGER === -Number.MAX_SAFE_INTEGER
// <- true
Number.MIN_SAFE_INTEGER === -9007199254740991
// <- true

Floating point arithmetic becomes unreliable beyond the [MIN_SAFE_INTEGER, MAX_SAFE_INTEGER] range. The 1 === 2 statement evaluates to false, because these are different values. If we add Number.MAX_SAFE_INTEGER to each operand, however, it’d seem 1 === 2 is indeed true.

1 === 2
// <- false
Number.MAX_SAFE_INTEGER + 1 === Number.MAX_SAFE_INTEGER + 2
// <- true
Number.MIN_SAFE_INTEGER - 1 === Number.MIN_SAFE_INTEGER - 2
// <- true

When it comes to checking whether an integer is safe, a Number.isSafeInteger function has been added to the language.

Number.isSafeInteger

This method returns true for any integer in the [MIN_SAFE_INTEGER, MAX_SAFE_INTEGER] range. Like with other Number methods introduced in ES6, there’s no type coercion involved. The input must be numeric, an integer, and within the aforementioned bounds in order for the method to return true. The next snippet shows a comprehensive set of inputs and outputs.

Number.isSafeInteger('one') // <- false
Number.isSafeInteger('0') // <- false
Number.isSafeInteger(null) // <- false
Number.isSafeInteger(NaN) // <- false
Number.isSafeInteger(Infinity) // <- false
Number.isSafeInteger(-Infinity) // <- false
Number.isSafeInteger(Number.MIN_SAFE_INTEGER - 1) // <- false
Number.isSafeInteger(Number.MIN_SAFE_INTEGER) // <- true
Number.isSafeInteger(1) // <- true
Number.isSafeInteger(1.2) // <- false
Number.isSafeInteger(Number.MAX_SAFE_INTEGER) // <- true
Number.isSafeInteger(Number.MAX_SAFE_INTEGER + 1) // <- false

When we want to verify if the result of an operation is within bounds, we must verify not only the result but also both operands.Dr. Axel Rauschmayer points this out in the article “New number and Math features in ES6”. One—​or both—​of the operands may be out of bounds, while the result is within bounds but incorrect. Similarly, the result may be out of bounds even if both operands are within bounds. Checking all of left, right, and the result of left op right is, thus, necessary to verify that we can indeed trust the result.

In the following example both operands are within bounds, but the result is incorrect.

Number.isSafeInteger(9007199254740000)
// <- true
Number.isSafeInteger(993)
// <- true
Number.isSafeInteger(9007199254740000 + 993)
// <- false
9007199254740000 + 993
// <- 9007199254740992, should be 9007199254740993

Certain operations and numbers, such as the following code snippet, may return correct results even when operands are out of bounds. The fact that correct results can’t be guaranteed, however, means that these operations can’t be trusted.

9007199254740000 + 994
// <- 9007199254740994

In the next example, one of the operands is out of bounds, and thus we can’t trust the result to be accurate.

Number.isSafeInteger(9007199254740993)
// <- false
Number.isSafeInteger(990)
// <- true
Number.isSafeInteger(9007199254740993 + 990)
// <- false
9007199254740993 + 990
// <-  9007199254741982, should be 9007199254741983

A subtraction in our last example would produce a result that is within bounds, but that result would also be inaccurate.

Number.isSafeInteger(9007199254740993)
// <- false
Number.isSafeInteger(990)
// <- true
Number.isSafeInteger(9007199254740993 - 990)
// <- true
9007199254740993 - 990
// <-  9007199254740002, should be 9007199254740003

If both operands are out of bounds, the output could end up in the safe space, even though the result is incorrect.

Number.isSafeInteger(9007199254740995)
// <- false
Number.isSafeInteger(9007199254740993)
// <- false
Number.isSafeInteger(9007199254740995 - 9007199254740993)
// <- true
9007199254740995 - 9007199254740993
// <- 4, should be 2

We can conclude that the only safe way to assert whether an operation produces correct output is with a utility function such as the one shown next. If we can’t ascertain that the operation and both operands are within bounds, then the result may be inaccurate, and that’s a problem. It’s best to throw in those situations and have a way to error-correct, but that’s specific to your programs. The important part is to actually catch these kinds of difficult bugs to deal with.

function safeOp(result, ...operands) {
  const values = [result, ...operands]
  if (!values.every(Number.isSafeInteger)) {
    throw new RangeError('Operation cannot be trusted!')
  }
  return result
}

You could use safeOp to ensure all operands, including the result, are safely within bounds.

safeOp(9007199254740000 + 993, 9007199254740000, 993)
// <- RangeError: Operation cannot be trusted!
safeOp(9007199254740993 + 990, 9007199254740993, 990)
// <- RangeError: Operation cannot be trusted!
safeOp(9007199254740993 - 990, 9007199254740993, 990)
// <- RangeError: Operation cannot be trusted!
safeOp(
  9007199254740993 - 9007199254740995,
  9007199254740993,
  9007199254740995
)
// <- RangeError: Operation cannot be trusted!
safeOp(1 + 2, 1, 2)
// <- 3

That’s all there is when it comes to Number, but we’re not done with arithmetics-related improvements quite yet. Let’s turn our attention to the Math built-in.

Math

ES6 introduces heaps of new static methods to the Math built-in. Some of them were specifically engineered toward making it easier to compile C into JavaScript, and you’ll seldom need them for day-to-day JavaScript application development. Others are complements to the existing rounding, exponentiation, and trigonometry API surface.

Let’s get right to it.

Math.sign

Many languages have a mathematical sign method that returns a vector (-1, 0, or 1) representation for the sign of the provided input. JavaScript’s Math.sign method does exactly that. However, the JavaScript flavor of this method has two more possible return values: -0 and NaN. Check out the examples in the following code snippet.

Math.sign(1) // <- 1
Math.sign(0) // <- 0
Math.sign(-0) // <- -0
Math.sign(-30) // <- -1
Math.sign(NaN) // <- NaN
Math.sign('one') // <- NaN, because Number('one') is NaN
Math.sign('0') // <- 0, because Number('0') is 0
Math.sign('7') // <- 1, because Number('7') is 7

Note how Math.sign casts its input into numeric values? While methods introduced to the Number built-in don’t cast their input via Number(value), most of the methods added to Math share this trait, as we shall see.

Math.trunc

We already had Math.floor and Math.ceil in JavaScript, with which we can round a number down or up, respectively. Now we also have Math.trunc as an alternative, which discards the decimal part without any rounding. Here, too, the input is coerced into a numeric value through Number(value).

Math.trunc(12.34567) // <- 12
Math.trunc(-13.58) // <- -13
Math.trunc(-0.1234) // <- -0
Math.trunc(NaN) // <- NaN
Math.trunc('one') // <- NaN, because Number('one') is NaN
Math.trunc('123.456') // <- 123,: Number('123.456') is 123.456

Creating a simple ponyfill for Math.trunc would involve checking whether the value is greater than zero and applying one of Math.floor or Math.ceil, as shown in the following code snippet.

function mathTrunc(value) {
  return value > 0 ? Math.floor(value) : Math.ceil(value)
}

Math.cbrt

The Math.cbrt method is short for "cubic root," similarly to how Math.sqrt is short for "square root." The following snippet has a few usage examples.

Math.cbrt(-1) // <- -1
Math.cbrt(3) // <- 1.4422495703074083
Math.cbrt(8) // <- 2
Math.cbrt(27) // <- 3

Note that this method also coerces nonnumerical values into numbers.

Math.cbrt('8') // <- 2, because Number('8') is 8
Math.cbrt('one') // <- NaN, because Number('one') is NaN

Let’s move on.

Math.expm1

This operation is the result of computing e to the value minus 1. In JavaScript, the e constant is defined as Math.E. The function in the following snippet is a rough equivalent of Math.expm1.

function expm1(value) {
  return Math.pow(Math.E, value) - 1
}

The evalue operation can be expressed as Math.exp(value) as well.

function expm1(value) {
  return Math.exp(value) - 1
}

Note that Math.expm1 has higher precision than merely doing Math.exp(value) - 1, and should be the preferred alternative.

expm1(1e-20)
// <- 0
Math.expm1(1e-20)
// <- 1e-20
expm1(1e-10)
// <- 1.000000082740371e-10
Math.expm1(1e-10)
// <- 1.00000000005e-10

The inverse function of Math.expm1 is Math.log1p.

Math.log1p

This is the natural logarithm of value plus 1—ln(value + 1)—and the inverse function of Math.expm1. The base e logarithm of a number can be expressed as Math.log in JavaScript.

function log1p(value) {
  return Math.log(value + 1)
}

Just like with Math.expm1, Math.log1p method is more precise than executing the Math.log(value + 1) operation by hand.

log1p(1.00000000005e-10)
// <- 1.000000082690371e-10
Math.log1p(1.00000000005e-10)
// <- 1e-10, exactly the inverse of Math.expm1(1e-10)

Math.log10

Base 10 logarithm of a number—log~10~(value).

Math.log10(1000)
// <- 3

You could ponyfill Math.log10 using the Math.LN10 constant.

function mathLog10(value) {
  return Math.log(x) / Math.LN10
}

And then there’s Math.log2.

Math.log2

Base 2 logarithm of a number—log~2~(value).

Math.log2(1024)
// <- 10

You could ponyfill Math.log2 using the Math.LN2 constant.

function mathLog2(value) {
  return Math.log(x) / Math.LN2
}

Note that the ponyfill version won’t be as precise as Math.log2, as demonstrated in the following example.

Math.log2(1 << 29) // native implementation
// <- 29
mathLog2(1 << 29) // ponyfill implementation
// <- 29.000000000000004

The << operator performs a "bitwise left shift". In this operation, the bits on the binary representation of the lefthand-side number are shifted as many places to the left as indicated in the righthand side of the operation. The following couple of examples show how shifting works, using the binary literal notation introduced in Binary and Octal Literals.

0b00000001 // 1
0b00000001 << 2 // shift bits two places to the left
0b00000100 // 4
0b00001101 // 1
0b00001101 << 4 // shift bits four places to the left
0b11010000 // 208

Trigonometric Functions

The Math object is getting trigonometric functions in ES6:

  • Math.sinh(value) returns the hyperbolic sine of value

  • Math.cosh(value) returns the hyperbolic cosine of value

  • Math.tanh(value) returns the hyperbolic tangent of value

  • Math.asinh(value) returns the hyperbolic arc-sine of value

  • Math.acosh(value) returns the hyperbolic arc-cosine of value

  • Math.atanh(value) returns the hyperbolic arc-tangent of value

Math.hypot

Using Math.hypot returns the square root of the sum of the squares of every provided argument.

Math.hypot(1, 2, 3)
// <- 3.741657386773941, the square root of (1*1 + 2*2 + 3*3)

We could ponyfill Math.hypot by performing these operations manually. We can use Math.sqrt to compute the square root and Array#reduce, combined with the spread operator, to sum the squares.You can go deeper into functional Array methods by reading the article "Fun with Native Arrays".

function mathHypot(...values) {
  const accumulateSquares (total, value) =>
    total + value * value
  const squares = values.reduce(accumulateSquares, 0)
  return Math.sqrt(squares)
}

Our handmade function is, surprisingly, more precise than the native one for this particular use case. In the next code sample, we see the hand-rolled hypot function offers precision with one more decimal place.

Math.hypot(1, 2, 3) // native implementation
// <- 3.741657386773941
mathHypot(1, 2, 3) // ponyfill implementation
// <- 3.7416573867739413

Bitwise Computation Helpers

At the beginning of Math, we talked about how some of the new Math methods are specifically engineered towards making it easier to compile C into JavaScript. Those are the last three methods we’ll cover, and they help us deal with 32-bit numbers.

Math.clz32

The name for this method is an acronym for "count leading zero bits in 32-bit binary representations of a number." Keeping in mind that the << operator performs a "bitwise left shift," let’s take a look at the next code snippet describing sample input and output for Math.clz32.

Math.clz32(0) // <- 32
Math.clz32(1) // <- 31
Math.clz32(1 << 1) // <- 30
Math.clz32(1 << 2) // <- 29
Math.clz32(1 << 29) // <- 2
Math.clz32(1 << 31) // <- 0
Math.imul

Returns the result of a C-like 32-bit multiplication.

Math.fround

Rounds value to the nearest 32-bit float representation of a number.

Strings and Unicode

You may recall template literals from [template_literals], and how those can be used to mix strings and variables, or any valid JavaScript expression, to produce string output.

function greet(name) {
  return `Hello, ${ name }!`
}
greet('Gandalf')
// <- 'Hello, Gandalf!'

Besides the template literal syntax, strings got a number of new methods in ES6. These can be categorized as string manipulation methods and Unicode-related methods. Let’s start with the former.

String#startsWith

Prior to ES6, whenever we wanted to check if a string begins with a certain other string, we’d use the String#indexOf method, as shown in the following code snippet. A result of 0 means that the string starts with the provided value.

'hello gary'.indexOf('gary')
// <- 6
'hello gary'.indexOf('hello')
// <- 0
'hello gary'.indexOf('stephan')
// <- -1

If you wanted to check if a string started with another one, then, you’d compare them with String#indexOf and check whether the lookup value is found at the beginning of the string: the 0 index.

'hello gary'.indexOf('gary') === 0
// <- false
'hello gary'.indexOf('hello') === 0
// <- true
'hello gary'.indexOf('stephan') === 0
// <- false

You can now use the String#startsWith method instead, avoiding the unnecessary complexity of checking whether an index matches 0.

'hello gary'.startsWith('gary')
// <- false
'hello gary'.startsWith('hello')
// <- true
'hello gary'.startsWith('stephan')
// <- false

In order to figure out whether a string contains a value starting at a specific index, using String#indexOf, we would have to grab a slice of that string first.

'hello gary'.slice(6).indexOf('gary') === 0
// <- true

We can’t simply check whether the index is 6, because that would give you false negatives when the queried value is found before reaching that index of 6. The following example shows how, even when the query 'ell' string is indeed at index 6, merely comparing the String#indexOf result with 6 is insufficient to attain a correct result.

'hello ell'.indexOf('ell') === 6
// <- false, because the result was 1

We could use the startIndex parameter for indexOf to get around this problem without relying on String#slice. Note that we’re still comparing against 6 in this case, because the string wasn’t sliced up in a setup operation.

'hello ell'.indexOf('ell', 6) === 6
// <- true

Instead of keeping all of these string searching implementation details in your head and writing code that’s most concerned with how to search, as opposed to what is being searched, we could use String#startsWith passing in the optional startIndex parameter as well.

'hello ell'.startsWith('ell', 6)
// <- true

String#endsWith

This method mirrors String#startsWith in the same way that String#lastIndexOf mirrors String#indexOf. It tells us whether a string ends with another string.

'hello gary'.endsWith('gary')
// <- true
'hello gary'.endsWith('hello')
// <- false

As the opposite of String#startsWith, there’s a position index that indicates where the lookup should end, instead of where it should start. It defaults to the length of the string.

'hello gary'.endsWith('gary', 10)
// <- true
'hello gary'.endsWith('gary', 9)
// <- false, it ends with 'gar' in this case
'hello gary'.endsWith('hell', 4)
// <- true

String#includes is one last method that can simplify a specific use case for String#indexOf.

String#includes

You can use String#includes to figure out whether a string contains another one, as shown in the following piece of code.

'hello gary'.includes('hell')
// <- true
'hello gary'.includes('ga')
// <- true
'hello gary'.includes('rye')
// <- false

This is equivalent to the ES5 use case of String#indexOf where we’d test the result against -1, checking to see whether the search string was anywhere to be found, as demonstrated in the next code snippet.

'hello gary'.indexOf('ga') !== -1
// <- true
'hello gary'.indexOf('rye') !== -1
// <- false

You can also provide String#includes with a start index where searching should begin.

'hello gary'.includes('ga', 4)
// <- true
'hello gary'.includes('ga', 7)
// <- false

Let’s move onto something that’s not just an String#indexOf alternative.

String#repeat

This handy method allows you to repeat a string count times.

'ha'.repeat(1)
// <- 'ha'
'ha'.repeat(2)
// <- 'haha'
'ha'.repeat(5)
// <- 'hahahahaha'
'ha'.repeat(0)
// <- ''

The provided count should be a non-negative finite number.

'ha'.repeat(Infinity)
// <- RangeError
'ha'.repeat(-1)
// <- RangeError

Decimal values are floored to the nearest integer.

'ha'.repeat(3.9)
// <- 'hahaha', count was floored to 3

Using NaN is interpreted as a count of 0.

'ha'.repeat(NaN)
// <- ''

Non-numeric values are coerced into numbers.

'ha'.repeat('ha')
// <- ', because Number('ha') is NaN
'ha'.repeat('3')
// <- 'hahaha', because Number('3') is 3

Values in the (-1, 0) range are rounded to -0 because count is passed through ToInteger, as documented by the specification.String#repeat in ECMAScript 6 Specification, section 21.1.3.13. That step in the specification dictates that count be cast with a formula like the one in the next code snippet.

function ToInteger(number) {
  return Math.floor(Math.abs(number)) * Math.sign(number)
}

The ToInteger function translates any values in the (-1, 0) range into -0. As a result, when passed to String#repeat, numbers in the (-1, 0) range will be treated as zero, while numbers in the [-1, -Infinity) range will result an exception, as we learned earlier.

'na'.repeat(-0.1)
// <- ', because count was rounded to -0
'na'.repeat(-0.9)
// <- ', because count was rounded to -0
'na'.repeat(-0.9999)
// <- ', because count was rounded to -0
'na'.repeat(-1)
// <- Uncaught RangeError: Invalid count value

An example use case for String#repeat may be the typical padding function. The indent function in the next code snippet takes a multiline string and indents every line with as many spaces as desired, using a default of two spaces.

function indent(text, spaces = 2) {
  return text
    .split('\n')
    .map(line => ' '.repeat(spaces) + line)
    .join('\n')
}

indent(`a
b
c`, 2)
// <- '  a\n  b\n  c'

String Padding and Trimming

At the time of this writing, there are two new string padding methods slated for publication in ES2017: String#padStart and String#padEnd. Using these methods, we wouldn’t have to implement something like indent in the previous code snippet. When performing string manipulation, we often want to pad a string so that it’s formatted consistently with a style we have in mind. This can be useful when formatting numbers, currency, HTML, and in a variety of other cases usually involving monospaced text.

Using padStart, we will specify the desired length for the target string and the padding string, which defaults to a single space character. If the original string is at least as long as the specified length, padStart will result in a null operation, returning the original string unchanged.

In the following example, the desired length of a properly padded string is 5, and the original string already has a length of at least 5, so it’s returned unchanged.

'01.23'.padStart(5)
// <- '01.23'

In the next example, the original string has a length of 4, thus padStart adds a single space at the beginning of the string, bringing the length to the desired value of 5.

'1.23'.padStart(5)
// <- ' 1.23'

The next example is just like the previous one, except it uses '0' for padding instead of the default ' ' value.

'1.23'.padStart(5, '0')
// <- '01.23'

Note that padStart will keep padding the string until the maximum length is reached.

'1.23'.padStart(7, '0')
// <- '0001.23'

However, if the padding string is too long, it may be truncated. The provided length is the maximum length of the padded string, except in the case where the original string is already larger than that.

'1.23'.padStart(7, 'abcdef')
// <- 'abc1.23'

The padEnd method has a similar API, but it adds the padding at the end of the original string, instead of at the beginning. The following snippet illustrates the difference.

'01.23'.padEnd(5) // <- '01.23'
'1.23'.padEnd(5) // <- '1.23 '
'1.23'.padEnd(5, '0') // <- '1.230'
'1.23'.padEnd(7, '0') // <- '1.23000'
'1.23'.padEnd(7, 'abcdef') // <- '1.23abc'

At the time of this writing, there’s a proposal for string trimming in stage 2, containing the String#trimStart and String#trimEnd methods. Using trimStart removes any whitespace from the beginning of a string, while using trimEnd removes any whitespace from the end of a string.

'   this should be left-aligned   '.trimStart()
// <- 'this should be left-aligned   '
'   this should be right-aligned   '.trimEnd()
// <- '   this should be right-aligned'

Let’s switch protocols and learn about Unicode.

Unicode

JavaScript strings are represented using UTF-16 code units.Learn more about UCS-2, UCS-4, UTF-16, and UTF-32. Each code unit can be used to represent a code point in the [U+0000, U+FFFF] range—​also known as the BMP, short for Basic Multilingual Plane. You can represent individual code points in the BMP plane using the '\u3456' syntax. You could also represent code units in the [U+0000, U+0255] range using the \x00..\xff notation. For instance, '\xbb' represents '»', the U+00BB code point, as you can also verify by doing String.fromCharCode(0xbb).

For code points beyond U+FFFF, you’d represent them as a surrogate pair. That is to say, two contiguous code units. For instance, the horse emoji (horse emoji) code point is represented with the '\ud83d\udc0e' contiguous code units. In ES6 notation you can also represent code points using the '\u{1f40e}' notation (that example is also the horse emoji).

Note that the internal representation hasn’t changed, so there are still two code units behind that single code point. In fact, '\u{1f40e}'.length evaluates to 2, one for each code unit.

The '\ud83d\udc0e\ud83d\udc71\u2764' string, found in the next code snippet, evaluates to a few emoji.

'\ud83d\udc0e\ud83d\udc71\u2764'
// <- 'horsemanheart'

While that string consists of five code units, we know that the length should really be 3—​as there are only three emoji.

'\ud83d\udc0e\ud83d\udc71\u2764'.length
// <- 5
'horsemanheart'.length

Counting code points before ES6 was tricky, as the language didn’t make an effort to help in the Unicode department. Take for instance Object.keys, as seen in the following code snippet. It returns five keys for our three-emoji string, because those three code points use five code units in total.

Object.keys('horsemanheart')
// <- ['0', '1', '2', '3', '4']

If we now consider a for loop, we can observe more clearly how this is a problem. In the following example, we wanted to extract each individual emoji from the text string, but we got each code unit instead of the code points they form.

const text = 'horsemanheart'
for (let i = 0; i < text.length; i++) {
  console.log(text[i])
  // <- '\ud83d'
  // <- '\udc0e'
  // <- '\ud83d'
  // <- '\udc71'
  // <- '\u2764'
}

Luckily for us, in ES6 strings adhere to the iterable protocol. We can use the string iterator to go over code points, even when those code points are made of surrogate pairs.

String.prototype[Symbol.iterator]

Given the problems with looping by code units, the iterables produced by the string iterator yield code points instead.

for (const codePoint of 'horsemanheart') {
  console.log(codePoint)
  // <- 'horse'
  // <- 'man'
  // <- 'heart'
}

Measuring the length of a string in terms of code points, as we saw earlier, is impossible with String#length, because it counts code units instead. We can, however, use an iterator to split the string into its code points, like we did in the for..of example.

We could use the spread operator, which relies on the iterator protocol, to split a string into an array made up of its conforming code points and then pull that array’s length, getting the correct code point count, as seen next.

[...'horsemanheart'].length
// <- 3

Keep in mind that splitting strings into code points isn’t enough if you want to be 100% precise about string length. Take for instance the combining overline Unicode code unit, represented with \u0305. On its own, this code unit is just an overline, as shown next.

'\u0305'
// <- ' ̅'

When preceded by another code unit, however, they are combined together into a single glyph.

function overlined(text) {
  return '${ text }\u0305'
}

overlined('o')
// <- 'o̅'
'hello world'.split('').map(overlined).join('')
// <- 'h̅e̅l̅l̅o̅ ̅w̅o̅r̅l̅d̅'

Attempts to näively figure out the actual length by counting code points prove insufficient, just like when using String#length to count code points, as shown next.

'o̅'.length
// <- 2
[...'o̅'].length
// <- 2, should be 1
[...'h̅e̅l̅l̅o̅ ̅w̅o̅r̅l̅d̅'].length
// <- 22, should be 11
[...'h̅e̅l̅l̅o̅ world'].length
// <- 16, should be 11

As Unicode expert Mathias Bynens points out, splitting by code points isn’t enough. Unlike surrogate pairs like the emojis we’ve used in our earlier examples, other grapheme clusters aren’t taken into account by the string iterator.I recommend you read "JavaScript has a Unicode problem" from Mathias Bynens. In the article, Mathias analyzes JavaScript's relationship with Unicode. In those cases we’re out of luck, and have to fall back to regular expressions or utility libraries to correctly calculate string length.

A Proposal to Split Grapheme Segments

Multiple code points that combine into a single visual glyph are getting more common.Emoji popularize this with glyphs sometimes made up of four code points. See this list of emoji made up of several code points. There is a new proposal in the works (currently in stage 2) that may settle the matter of iterating over grapheme clusters once and for all. It introduces an Intl.Segmenter built-in, which can be used to split a string into an iterable sequence.

To use the Segmenter API, we start by creating an instance of Intl.Segmenter specifying a locale and the granularity level we want: per grapheme, word, sentence, or line. The segmenter instance can be used to produce an iterator for any given string, splitting it by the specified granularity. Note that the segmenting algorithm may vary depending on the locale, which is why it is a part of the API.

The following example defines a getGraphemes function that produces an array of grapheme clusters for any given locale and piece of text.

function getGraphemes(locale, text) {
  const segmenter = new Intl.Segmenter(locale, {
    granularity: 'grapheme'
})
  const sequence = segmenter.segment(text)
  const graphemes = [...sequence].map(item => item.segment)
  return graphemes
}
getGraphemes('es', 'Esto está bien bueno!')

Using the Segmenter proposal, we wouldn’t have any trouble splitting strings containing emoji or other combining code units.

Let’s look at more Unicode-related methods introduced in ES6.

String#codePointAt

We can use String#codePointAt to get the numeric representation of a code point at a given position in a string. Note that the expected start position is indexed by code unit, not by code point. In the following example we print the code points for each of the three emoji in our demo 'horsemanheart' string.

const text = '\ud83d\udc0e\ud83d\udc71\u2764'
text.codePointAt(0)
// <- 0x1f40e
text.codePointAt(2)
// <- 0x1f471
text.codePointAt(4)
// <- 0x2764

Identifying the indices that need to be provided to String#codePointAt may prove cumbersome, which is why you should instead loop through a string iterator that can identify them on your behalf. You can then call .codePointAt(0) for each code point in the sequence, and 0 will always be the correct start index.

const text = '\ud83d\udc0e\ud83d\udc71\u2764'
for (const codePoint of text) {
  console.log(codePoint.codePointAt(0))
  // <- 0x1f40e
  // <- 0x1f471
  // <- 0x2764
}

We could also reduce our example to a single line of code by using a combination of the spread operator and Array#map.

const text = '\ud83d\udc0e\ud83d\udc71\u2764'
[...text].map(cp => cp.codePointAt(0))
// <- [0x1f40e, 0x1f471, 0x2764]

You can take the base-16 representation of those base-10 code points, and use them to create a string with the new Unicode code point escape syntax of \u{codePoint}. This syntax allows you to represent Unicode code points that are beyond the BMP. That is, code points outside the [U+0000, U+FFFF] range that are typically represented using the \u1234 syntax.

Let’s start by updating our example to print the hexadecimal version of our code points.

const text = '\ud83d\udc0e\ud83d\udc71\u2764'
[...text].map(cp => cp.codePointAt(0).toString(16))
// <- ['1f40e', '1f471', '2764']

We could wrap those base-16 values in '\u{codePoint}' and voilá: you’d get the emoji values once again.

'\u{1f40e}'
// <- 'horse'
'\u{1f471}'
// <- 'man'
'\u{2764}'
// <- 'heart'

String.fromCodePoint

This method takes in a number and returns a code point. Note how I can use the 0x prefix with the terse base-16 code points we got from String#codePointAt moments ago.

String.fromCodePoint(0x1f40e)
// <- 'horse'
String.fromCodePoint(0x1f471)
// <- 'man'
String.fromCodePoint(0x2764)
// <- 'heart'

You can just as well use plain base-10 literals and achieve the same results.

String.fromCodePoint(128014)
// <- 'horse'
String.fromCodePoint(128113)
// <- 'man'
String.fromCodePoint(10084)
// <- 'heart'

You can pass in as many code points as you’d like to String.fromCodePoint.

String.fromCodePoint(0x1f40e, 0x1f471, 0x2764)
// <- 'horsemanheart'

As an exercise in futility, we could map a string to their numeric representation of code points, and back to the code points themselves.

const text = '\ud83d\udc0e\ud83d\udc71\u2764'
[...text]
  .map(cp => cp.codePointAt(0))
  .map(cp => String.fromCodePoint(cp))
  .join('')
// <- 'horsemanheart'

Reversing a string has potential to cause issues as well.

Unicode-Aware String Reversal

Consider the following piece of code.

const text = '\ud83d\udc0e\ud83d\udc71\u2764'
text.split('').map(cp => cp.codePointAt(0))
// <- [55357, 56334, 55357, 56433, 10084]
text.split('').reverse().map(cp => cp.codePointAt(0))
// <- [10084, 56433, 128014, 55357]

The problem is that we’re reversing individual code units, while we’d have to reverse code points for a correct solution. If, instead, we were to use the spread operator to split the string by its code points, and then reversed that, the code points would be preserved and the string would be properly reversed.

const text = '\ud83d\udc0e\ud83d\udc71\u2764'
[...text].reverse().join('')
// <- 'heartmanhorse'

This way we avoid breaking up code points. Once again, keep in mind that this won’t work for all grapheme clusters.

[...'hello\u0305'].reverse().join('')
// <- ` ̅olleh`

The last Unicode-related method we’ll be addressing is .normalize.

String#normalize

There are different ways of representing strings that look identical to humans even though their code points differ. Consider the following example, where two seemingly identical strings aren’t deemed equal by any JavaScript runtime.

'mañana' === 'mañana'
// <- false

What’s going on here? We have an ñ on the left version, while the version on the right has a combining tilde character + ̃` and an n. The two are visually identical, but if we take a look at the code points, we’ll notice they’re different.

[...'mañana'].map(cp => cp.codePointAt(0).toString(16))
// <- ['6d', '61', 'f1', '61', '6e', '61']
[...'mañana'].map(cp => cp.codePointAt(0).toString(16))
// <- ['6d', '61', '6e', '303', '61', '6e', '61']

Just like with the 'hello̅' examples, the second string has a length of 7, even though visually it is also 6 glyphs long.

[...'mañana'].length
// <- 6
[...'mañana'].length
// <- 7

If we normalize the second version, using String#normalize, we’ll get back the same code points we had in the first version.

const normalized = 'mañana'.normalize()
[...normalized].map(cp => cp.codePointAt(0).toString(16))
// <- ['6d', '61', 'f1', '61', '6e', '61']
normalized.length
// <- 6

Note that we should use String#normalize on both strings when comparing them if we want to test for equality.

function compare(left, right) {
  return left.normalize() === right.normalize()
}
const normal = 'mañana'
const irregular = 'mañana'
normal === irregular
// <- false
compare(normal, irregular)
// <- true

Regular Expressions

In this section we’ll take a look at regular expressions in and after ES6. There are a couple of regular expressions flags that were introduced in ES6: the /y, or sticky flag, and the /u, or Unicode flag. Then we’ll discuss five proposals that are making their way through the ECMAScript specification development process at TC39.

Sticky Matching Flag /y

The sticky matching y flag introduced in ES6 is similar to the global g flag. Like global regular expressions, sticky ones are typically used to match several times until the input string is exhausted. Sticky regular expressions move lastIndex to the position after the last match, just like global regular expressions. The only difference is that a sticky regular expression must start matching where the previous match left off, unlike global regular expressions that move onto the rest of the input string when the regular expression goes unmatched at any given position.

The following example illustrates the difference between the two. Given an input string like 'haha haha haha' and the /ha/ regular expression, the global flag will match every occurrence of 'ha', while the sticky flag will only match the first two, since the third occurrence doesn’t match starting at index 4, but rather at index 5.

function matcher(regex, input) {
  return () => {
    const match = regex.exec(input)
    const lastIndex = regex.lastIndex
    return { lastIndex, match }
  }
}
const input = 'haha haha haha'
const nextGlobal = matcher(/ha/g, input)
console.log(nextGlobal()) // <- { lastIndex: 2, match: ['ha'] }
console.log(nextGlobal()) // <- { lastIndex: 4, match: ['ha'] }
console.log(nextGlobal()) // <- { lastIndex: 7, match: ['ha'] }
const nextSticky = matcher(/ha/y, input)
console.log(nextSticky()) // <- { lastIndex: 2, match: ['ha'] }
console.log(nextSticky()) // <- { lastIndex: 4, match: ['ha'] }
console.log(nextSticky()) // <- { lastIndex: 0, match: null }

We can verify that the sticky matcher would work if we forcefully moved lastIndex with the next piece of code.

const rsticky = /ha/y
const nextSticky = matcher(rsticky, input)
console.log(nextSticky()) // <- { lastIndex: 2, match: ['ha'] }
console.log(nextSticky()) // <- { lastIndex: 4, match: ['ha'] }
rsticky.lastIndex = 5
console.log(nextSticky()) // <- { lastIndex: 7, match: ['ha'] }

Sticky matching was added to JavaScript as a way of improving the performance of lexical analyzers in compilers, which heavily rely on regular expressions.

Unicode Flag /u

ES6 also introduced a u flag. The u stands for Unicode, but this flag can also be thought of as a more strict version of regular expressions.

Without the u flag, the following snippet has a regular expression containing an 'a' character literal that was unnecessarily escaped.

/\a/.test('ab')
// <- true

Using an escape sequence for an unreserved character such as a in a regular expression with the u flag results in an error, as shown in the following bit of code.

/\a/u.test('ab')
// <- SyntaxError: Invalid escape: /\a/

The following example attempts to embed the horse emoji in a regular expression by way of the \u{1f40e} notation that ES6 introduced for strings like '\u{1f40e}', but the regular expression fails to match against the horse emoji. Without the u flag, the \u{…} pattern is interpreted as having an unnecessarily escaped u character followed by the rest of the sequence.

/\u{1f40e}/.test('horse') // <- false
/\u{1f40e}/.test('u{1f40e}') // <- true

The u flag introduces support for Unicode code point escapes, like the \u{1f40e} horse emoji, within regular expressions.

/\u{1f40e}/u.test('horse')
// <- true

Without the u flag, the . pattern matches any BMP symbol except for line terminators. The following example tests U+1D11E MUSICAL SYMBOL G CLEF, an astral symbol that doesn’t match the dot pattern in plain regular expressions.

const rdot = /^.$/
rdot.test('a') // <- true
rdot.test('\n') // <- false
rdot.test('\u{1d11e}') // <- false

When using the u flag, Unicode symbols that aren’t on the BMP are matched as well. The next snippet shows how the astral symbol matches when the flag is set.

const rdot = /^.$/u
rdot.test('a') // <- true
rdot.test('\n') // <- false
rdot.test('\u{1d11e}') // <- true

When the u flag is set, similar Unicode awareness improvements can be found in quantifiers and character classes, both of which treat each Unicode code point as a single symbol, instead of matching on the first code unit only. Insensitive case matching with the i flag performs Unicode case folding when the u flag is set as well, which is used to normalize code points in both the input string and the regular expression.For more details around the u flag in regular expressions, read "Unicode-aware regular expressions in ECMAScript 6" from Mathias Bynens.

Named Capture Groups

Up until now, JavaScript regular expressions could group matches in numbered capturing groups and noncapturing groups. In the next snippet we’re using a couple of groups to extract a key and value from an input string containing a key/value pair delimited by '='.

function parseKeyValuePair(input) {
  const rattribute = /([a-z]+)=([a-z]+)/
  const [, key, value] = rattribute.exec(input)
  return { key, value }
}
parseKeyValuePair('strong=true')
// <- { key: 'strong', value: 'true' }

There' are also noncapturing groups, which are discarded and not present in the final result, but are still useful for matching. The following example supports input with key/value pairs delimited by ' is ' in addition to '='.

function parseKeyValuePair(input) {
  const rattribute = /([a-z]+)(?:=|\sis\s)([a-z]+)/
  const [, key, value] = rattribute.exec(input)
  return { key, value }
}
parseKeyValuePair('strong is true')
// <- { key: 'strong', value: 'true' }
parseKeyValuePair('flexible=too')
// <- { key: 'flexible', value: 'too' }

While array destructuring in the previous example hid our code’s reliance on magic array indices, the fact remains that matches are placed in an ordered array regardless. The named capture groups proposalCheck out the named capture groups proposal document. (in stage 3 at the time of this writing) adds syntax like (?<groupName>) to Unicode-aware regular expressions, where we can name capturing groups which are then returned in a groups property of the returned match object. The groups property can then be destructured from the resulting object when calling RegExp#exec or String#match.

function parseKeyValuePair(input) {
  const rattribute = (
    /(?<key>[a-z]+)(?:=|\sis\s)(?<value>[a-z]+)/
)
  const { groups } = rattribute.exec(input)
  return groups
}
parseKeyValuePair('strong=true')
// <- { key: 'strong', value: 'true' }
parseKeyValuePair('flexible=too')
// <- { key: 'flexible', value: 'too' }

JavaScript regular expressions support backreferences, where captured groups can be reused to look for duplicates. The following snippet uses a backreference for the first capturing group to identify cases where a username is the same as a password in a piece of 'user:password' input.

function hasSameUserAndPassword(input) {
  const rduplicate = /([^:]+):\1/
  return rduplicate.exec(input) !== null
}
hasSameUserAndPassword('root:root') // <- true
hasSameUserAndPassword('root:pF6GGlyPhoy1!9i') // <- false

The named capture groups proposal adds support for named backreferences, which refer back to named groups.

function hasSameUserAndPassword(input) {
  const rduplicate = /(?<user>[^:]+):\k<user>/u
  return rduplicate.exec(input) !== null
}
hasSameUserAndPassword('root:root') // <- true
hasSameUserAndPassword('root:pF6GGlyPhoy1!9i') // <- false

The \k<groupName> reference can be used in tandem with numbered references, but the latter are better avoided when already using named references.

Lastly, named groups can be referenced from the replacement passed to String#replace. In the next code snippet we use String#replace and named groups to change an American date string to use Hungarian formatting.

function americanDateToHungarianFormat(input) {
  const ramerican = (
    /(?<month>\d{2})\/(?<day>\d{2})\/(?<year>\d{4})/
)
  const hungarian = input.replace(
    ramerican,
    '$<year>-$<month>-$<day>'
)
  return hungarian
}
americanDateToHungarianFormat('06/09/1988')
// <- '1988-09-06'

If the second argument to String#replace is a function, then the named groups can be accessed via a new parameter called groups that is at the end of the parameter list. The signature for that function now is (match, …​captures, groups). In the following example, note how we’re using a template literal that’s similar to the replacement string found in the last example. The fact that replacement strings follow a $<groupName> syntax as opposed to a `${ groupName }` syntax means we can name groups in replacement strings without having to resort to escape codes if we were using template literals.

function americanDateToHungarianFormat(input) {
  const ramerican = (
    /(?<month>\d{2})\/(?<day>\d{2})\/(?<year>\d{4})/
)
  const hungarian = input.replace(ramerican, (...rest) => {
    const groups = rest[rest.length - 1]
    const { month, day, year } = groups
    return `${ year }-${ month }-${ day }`
  })
  return hungarian
}
americanDateToHungarianFormat('06/09/1988') // <- '1988-09-06'

Unicode Property Escapes

The proposed Unicode property escapesCheck out the Unicode property escapes proposal document. (currently in stage 3) are a new kind of escape sequence that’s available in regular expressions marked with the u flag. This proposal adds an escape in the form of \p{LoneUnicodePropertyNameOrValue} for binary Unicode properties and \p{UnicodePropertyName=UnicodePropertyValue} for nonbinary Unicode properties. In addition, \P is the negated version of a \p escape sequence.

The Unicode standard defines properties for every symbol. Armed with these properties, one may make advanced queries about Unicode characters. For example, symbols in the Greek alphabet have a Script property set to Greek. We could use the new escapes to match any Greek Unicode symbol.

function isGreekSymbol(input) {
  const rgreek = /^\p{Script=Greek}$/u
  return rgreek.test(input)
}
isGreekSymbol('π')
// <- true

Or, using \P, we could match non-Greek Unicode symbols.

function isNonGreekSymbol(input) {
  const rgreek = /^\P{Script=Greek}$/u
  return rgreek.test(input)
}
isNonGreekSymbol('π')
// <- false

When we need to match every Unicode decimal number symbol, and not just [0-9] like \d does, we could use \p{Decimal_Number} as shown next.

function isDecimalNumber(input) {
  const rdigits = /^\p{Decimal_Number}+$/u
  return rdigits.test(input)
}
isDecimalNumber('')
// <- true

Check out this exhaustive overview of supported Unicode properties and values.

Lookbehind Assertions

JavaScript has had positive lookahead assertions for a long time. That feature allows us to match an expression but only if it’s followed by another expression. These assertions are expressed as (?=…). Regardless of whether a lookahead assertion matches, the results of that match are discarded and no characters of the input string are consumed.

The following example uses a positive lookahead to test whether an input string has a sequence of letters followed by .js, in which case it returns the filename without the .js part.

function getJavaScriptFilename(input) {
  const rfile = /^(?<filename>[a-z]+)(?=\.js)\.[a-z]+$/u
  const match = rfile.exec(input)
  if (match === null) {
    return null
  }
  return match.groups.filename
}
getJavaScriptFilename('index.js') // <- 'index'
getJavaScriptFilename('index.php') // <- null

There are also negative lookahead assertions, which are expressed as (?!…) as opposed to (?=…) for positive lookaheads. In this case, the assertion succeeds only if the lookahead expression isn’t matched. The next bit of code uses a negative lookahead and we can observe how the results are flipped: now any expression other than '.js' results in a passed assertion.

function getNonJavaScriptFilename(input) {
  const rfile = /^(?<filename>[a-z]+)(?!\.js)\.[a-z]+$/u
  const match = rfile.exec(input)
  if (match === null) {
    return null
  }
  return match.groups.filename
}
getNonJavaScriptFilename('index.js') // <- null
getNonJavaScriptFilename('index.php') // <- 'index'

The proposal for lookbehindCheck out the lookbehind assertions proposal document. (stage 3) introduces positive and negative lookbehind assertions, denoted with (?⇐…) and (?<!…), respectively. These assertions can be used to ensure a pattern we want to match is or isn’t preceded by another given pattern. The following snippet uses a positive lookbehind to match the digits in dollar amounts, but not for amounts in euros.

function getDollarAmount(input) {
  const rdollars = /^(?<=\$)(?<amount>\d+(?:\.\d+)?)$/u
  const match = rdollars.exec(input)
  if (match === null) {
    return null
  }
  return match.groups.amount
}
getDollarAmount('$12.34') // <- '12.34'
getDollarAmount('€12.34') // <- null

On the other hand, a negative lookbehind could be used to match numbers that aren’t preceded by a dollar sign.

function getNonDollarAmount(input) {
  const rnumbers = /^(?<!\$)(?<amount>\d+(?:\.\d+)?)$/u
  const match = rnumbers.exec(input)
  if (match === null) {
    return null
  }
  return match.groups.amount
}
getNonDollarAmount('$12.34') // <- null
getNonDollarAmount('€12.34') // <- '12.34'

A New /s "dotAll" Flag

When using the . pattern, we typically expect to match every single character. In JavaScript, however, a . expression doesn’t match astral characters (which can be fixed by adding the u flag) nor line terminators.

const rcharacter = /^.$/
rcharacter.test('a') // <- true
rcharacter.test('\t') // <- true
rcharacter.test('\n') // <- false

This sometimes drives developers to write other kinds of expressions to synthesize a pattern that matches any character. The expression in the next bit of code matches any character that’s either a whitespace character or a nonwhitespace character, delivering the behavior we’d expect from the . pattern matcher.

const rcharacter = /^[\s\S]$/
rcharacter.test('a') // <- true
rcharacter.test('\t') // <- true
rcharacter.test('\n') // <- true

The dotAll proposalCheck out the dotAll flag proposal document. (stage 3) adds an s flag, which changes the behavior of . in JavaScript regular expressions to match any single character.

const rcharacter = /^.$/s
rcharacter.test('a') // <- true
rcharacter.test('\t') // <- true
rcharacter.test('\n') // <- true

String#matchAll

Often, when we have a regular expression with a global or sticky flag, we want to iterate over the set of captured groups for each match. Currently, it can be a bit of a hassle to produce the list of matches: we need to collect the captured groups using String#match or RegExp#exec in a loop, until the regular expression doesn’t match the input starting at the lastIndex position property. In the following piece of code, the parseAttributes generator function does just that for a given regular expression.

function* parseAttributes(input) {
  const rattributes = /(\w+)="([^"]+)"\s/ig
  while (true) {
    const match = rattributes.exec(input)
    if (match === null) {
      break
    }
    const [ , key, value] = match
    yield [key, value]
  }
}
const html = '<input type="email"
placeholder="hello@mjavascript.com" />'
console.log(...parseAttributes(html))
// [
//   ['type', 'email']
//   ['placeholder', 'hello@mjavascript.com']
// ]

One problem with this approach is that it’s tailor-made for our regular expression and its capturing groups. We could fix that issue by creating a matchAll generator that is only concerned about looping over matches and collecting sets of captured groups, as shown in the following snippet.

function* matchAll(regex, input) {
  while (true) {
    const match = regex.exec(input)
    if (match === null) {
      break
    }
    const [ , ...captures] = match
    yield captures
  }
}
function* parseAttributes(input) {
  const rattributes = /(\w+)="([^"]+)"\s/ig
  yield* matchAll(rattributes, input)
}
const html = '<input type="email"
placeholder="hello@mjavascript.com" />'
console.log(...parseAttributes(html))
// [
//   ['type', 'email']
//   ['placeholder', 'hello@mjavascript.com']
// ]

A bigger source of confusion is that rattributes mutates its lastIndex property on each call to RegExp#exec, which is how it can track the position after the last match. When there are no matches left, lastIndex is reset back to 0. A problem arises when we don’t iterate over all possible matches for a piece of input in one go—​which would reset lastIndex to 0—and then we use the regular expression on a second piece of input, obtaining unexpected results.

While it looks like our matchAll implementation wouldn’t fall victim of this given it loops over all matches, it’d be possible to iterate over the generator by hand, meaning that we’d run into trouble if we reused the same regular expression, as shown in the next bit of code. Note how the second matcher should report ['type', 'text'] but instead starts at an index much further ahead than 0, even misreporting the 'placeholder' key as 'laceholder'.

const rattributes = /(\w+)="([^"]+)"\s/ig
const email = '<input type="email"
placeholder="hello@mjavascript.com" />'
const emailMatcher = matchAll(rattributes, email)
const address = '<input type="text"
placeholder="Enter your business address" />'
const addressMatcher = matchAll(rattributes, address)
console.log(emailMatcher.next().value)
// <- ['type', 'email']
console.log(addressMatcher.next().value)
// <- ['laceholder', 'Enter your business address']

One solution would be to change matchAll so that lastIndex is always 0 when we yield back to the consumer code, while keeping track of lastIndex internally so that we can pick up where we left off in each step of the sequence.

The following piece of code shows that indeed, that’d fix the problems we’re observing. Reusable global regular expressions are often avoided for this very reason: so that we don’t have to worry about resetting lastIndex after every use.

function* matchAll(regex, input) {
  let lastIndex = 0
  while (true) {
    regex.lastIndex = lastIndex
    const match = regex.exec(input)
    if (match === null) {
      break
    }
    lastIndex = regex.lastIndex
    regex.lastIndex = 0
    const [ , ...captures] = match
    yield captures
  }
}
const rattributes = /(\w+)="([^"]+)"\s/ig
const email = '<input type="email"
placeholder="hello@mjavascript.com" />'
const emailMatcher = matchAll(rattributes, email)
const address = '<input type="text"
placeholder="Enter your business address" />'
const addressMatcher = matchAll(rattributes, address)
console.log(emailMatcher.next().value)
// <- ['type', 'email']
console.log(addressMatcher.next().value)
// <- ['type', 'text']
console.log(emailMatcher.next().value)
// <- ['placeholder', 'hello@mjavascript.com']
console.log(addressMatcher.next().value)
// <- ['placeholder', 'Enter your business address']

The String#matchAll proposalCheck out the String#matchAll proposal document. (in stage 1 at the time of this writing) introduces a new method for the string prototype that would behave in a similar fashion as our matchAll implementation, except the returned iterable is a sequence of match objects as opposed to just the captures in the preceding example. Note that the String#matchAll sequence contains entire match objects, and not just numbered captures. This means we could access named captures through match.groups for each match in the sequence.

const rattributes = /(?<key>\w+)="(?<value>[^"]+)"\s/igu
const email = '<input type="email"
placeholder="hello@mjavascript.com" />'
for (const match of email.matchAll(rattributes)) {
  const { groups: { key, value } } = match
  console.log(`${ key }: ${ value }`)
}
// <- type: email
// <- placeholder: hello@mjavascript.com

Array

Over the years, libraries like Underscore and Lodash spoke loudly of missing features when it came to arrays. As a result, ES5 brought in heaps of functional methods to arrays: Array#filter, Array#map, Array#reduce, Array#reduceRight, Array#forEach, Array#some, and Array#every.

ES6 brings a few more methods that will help manipulate, fill, and filter arrays.

Array.from

Before ES6, JavaScript developers often needed to cast arguments to a function into an array.

function cast() {
  return Array.prototype.slice.call(arguments)
}
cast('a', 'b')
// <- ['a', 'b']

We’ve already explored more terse ways of doing this in [es6-essentials], when we first learned about rest and spread. You could, for instance, use the spread operator. As you no doubt remember, the spread operator leverages the iterator protocol to produce a sequence of values in arbitrary objects. The downside is that the objects we want to cast with spread must adhere to the iterator protocol by having implemented Symbol.iterator. Luckily for us, arguments does implement the iterator protocol in ES6.

function cast() {
  return [...arguments]
}
cast('a', 'b')
// <- ['a', 'b']

Using the function rest parameter would be better for this particular case as it wouldn’t involve the arguments object, nor any added logic in the function body.

function cast(...params) {
  return params
}
cast('a', 'b')
// <- ['a', 'b']

You may also want to cast NodeList DOM element collections, like those returned from document.querySelectorAll, through the spread operator. This can be helpful when we need access to native array methods like Array#map or Array#filter. This is possible because the DOM standard upgraded NodeList to an iterable, after ES6 defined the iterator protocol.

[...document.querySelectorAll('div')]
// <- [<div>, <div>, <div>, …]

What happens when we try to cast a jQuery collection through the spread operator? If you’re on a modern version of jQuery that implements the iterator protocol, spreading a jQuery object will work, otherwise you may get an exception.

[...$('div')]
// <- [<div>, <div>, <div>, …]

The new Array.from method is a bit different. It doesn’t only rely on the iterator protocol to figure out how to pull values from an object. It has support for array-likes out the box, unlike the spread operator. The following code snippet will work with any version of jQuery.

Array.from($('div'))
// <- [<div>, <div>, <div>, …]

The one thing you cannot do with either Array.from nor the spread operator is to pick a start index. Suppose you wanted to pull every <div> after the first one. With Array#slice, you could do the following.

[].slice.call(document.querySelectorAll('div'), 1)

Of course, there’s nothing stopping you from using Array#slice after casting. This is a bit easier to read than the previous example, as it keeps the slice call closer to the index at which we want to slice the array.

Array.from(document.querySelectorAll('div')).slice(1)

Array.from has three arguments, although only the input is required. To wit:

  • input—the array-like or iterable object you want to cast

  • map—a mapping function that’s executed on every item of input

  • context—the this binding to use when calling map

With Array.from you cannot slice, but you can dice. The map function will efficiently map the values into something else as they’re being added to the array that results from calling Array.from.

function typesOf() {
  return Array.from(arguments, value => typeof value)
}
typesOf(null, [], NaN)
// <- ['object', 'object', 'number']

Do note that, for the specific case of dealing with arguments, you could also combine rest parameters and Array#map. In this case in particular, we may be better off just doing something like the snippet of code found next. It’s not as verbose as the previous example. Like with the Array#slice example we saw earlier, the mapping is more explicit in this case.

function typesOf(...all) {
  return all.map(value => typeof value)
}
typesOf(null, [], NaN)
// <- ['object', 'object', 'number']

When dealing with array-like objects, it makes sense to use Array.from if they don’t implement Symbol.iterator.

const apple = {
  type: 'fruit',
  name: 'Apple',
  amount: 3
}
const onion = {
  type: 'vegetable',
  name: 'Onion',
  amount: 1
}
const groceries = {
  0: apple,
  1: onion,
  length: 2
}
Array.from(groceries)
// <- [apple, onion]
Array.from(groceries, grocery => grocery.type)
// <- ['fruit', 'vegetable']

Array.of

The Array.of method is exactly like the cast function we played around with earlier. Next is a code snippet that shows how Array.of might be ponyfilled.

function arrayOf(...items) {
  return items
}

The Array constructor has two overloads: …​items, where you provide the items for the new array; and length, where you provide its numeric length. You can think about Array.of as a flavor of new Array that doesn’t support a length overload. In the following code snippet, you’ll find some of the unexpected ways in which new Array behaves, thanks to its single-argument length overloaded constructor. If you’re confused about the undefined x ${ count } notation in the browser console, that’s indicating there are array holes in those positions. This is also known as a sparse array.

new Array() // <- []
new Array(undefined) // <- [undefined]
new Array(1) // <- [undefined x 1]
new Array(3) // <- [undefined x 3]
new Array('3') // <- ['3']
new Array(1, 2) // <- [1, 2]
new Array(-1, -2) // <- [-1, -2]
new Array(-1) // <- RangeError: Invalid array length

In contrast, Array.of has more consistent behavior because it doesn’t have the special length case. This makes it a more desirable way of consistently creating new arrays programmatically.

console.log(Array.of()) // <- []
console.log(Array.of(undefined)) // <- [undefined]
console.log(Array.of(1)) // <- [1]
console.log(Array.of(3)) // <- [3]
console.log(Array.of('3')) // <- ['3']
console.log(Array.of(1, 2)) // <- [1, 2]
console.log(Array.of(-1, -2)) // <- [-1, -2]
console.log(Array.of(-1)) // <- [-1]

Array#copyWithin

Let’s start with the signature of Array#copyWithin.

Array.prototype.copyWithin(target, start = 0, end = this.length)

The Array#copyWithin method copies a sequence of array elements within an array instance to the "paste position" starting at target. The elements to be copied are taken from the [start, end) range. The Array#copyWithin method returns the array instance itself.

Let’s lead with a simple example. Consider the items array in the following code snippet.

const items = [1, 2, 3, , , , , , , , ]
// <- [1, 2, 3, undefined x 7]

The function call shown next takes the items array and determines that it’ll start "pasting" items in the sixth position (zero-based). It further determines that the items to be copied will be taken starting in the second position, until the third position (not inclusive).

const items = [1, 2, 3, , , , , , , , ]
items.copyWithin(6, 1, 3)
// <- [1, 2, 3, undefined × 3, 2, 3, undefined × 2]

Reasoning about Array#copyWithin is hard. Let’s break it down.

If we consider that the items to be copied were taken from the [start, end) range, then we could express that using an Array#slice call. These are the items that were pasted at the target position. We can use .slice to grab the copy.

const items = [1, 2, 3, , , , , , , , ]
const copy = items.slice(1, 3)
// <- [2, 3]

We could also consider the pasting part of the operation as an advanced usage of Array#splice. The next code snippet does just that, passing the paste position to splice, telling it to remove as many items as we want to copy, and inserting the pasted items. Note that we’re using the spread operator so that elements are inserted individually, and not as an array, through .splice.

const items = [1, 2, 3, , , , , , , , ]
const copy = items.slice(1, 3)
// <- [2, 3]
items.splice(6, 3 - 1, ...copy)
console.log(items)
// <- [1, 2, 3, undefined × 3, 2, 3, undefined × 2]

Now that we better understand the internals of Array#copyWithin, we can generalize the example in order to implement the custom copyWithin function shown in the following code snippet.

function copyWithin(
  items,
  target,
  start = 0,
  end = items.length
) {
  const copy = items.slice(start, end)
  const removed = end - start
  items.splice(target, removed, ...copy)
  return items
}

The example we’ve been trying so far would work just as well with our custom copyWithin function.

copyWithin([1, 2, 3, , , , , , , , ], 6, 1, 3)
// <- [1, 2, 3, undefined × 3, 2, 3, undefined × 2]

Array#fill

A convenient utility method to replace all items in an array with the provided value. Note that sparse arrays will be filled in their entirety, while existing items will be replaced by the fill value.

['a', 'b', 'c'].fill('x') // <- ['x', 'x', 'x']
new Array(3).fill('x') // <- ['x', 'x', 'x']

You could also specify the starting index and end index. In this case, as shown next, only the items in those positions would be filled.

['a', 'b', 'c', , ,].fill('x', 2)
// <- ['a', 'b', 'x', 'x', 'x']
new Array(5).fill('x', 0, 1)
// <- ['x', undefined x 4]

The provided value can be anything, and is not just limited to primitive values.

new Array(3).fill({})
// <- [{}, {}, {}]

You can’t fill arrays using a mapping method that takes an index parameter or anything like that.

const map = i => i * 2
new Array(3).fill(map)
// <- [map, map, map]

Array#find and Array#findIndex

The Array#find method runs a callback for each item in an array until the first one that returns true, and then returns that item. The method follows the signature of (callback(item, i, array), context) that’s also present in Array#map, Array#filter, and others. You can think of Array#find as a version of Array#some that returns the matching element instead of just true.

['a', 'b', 'c', 'd', 'e'].find(item => item === 'c')
// <- 'c'
['a', 'b', 'c', 'd', 'e'].find((item, i) => i === 0)
// <- 'a'
['a', 'b', 'c', 'd', 'e'].find(item => item === 'z')
// <- undefined

There’s an Array#findIndex method as well, and it leverages the same signature. Instead of returning a Boolean value, or the element itself, Array.findIndex returns the index of the matching element, or -1 if no matches occur. Here are a few examples.

['a', 'b', 'c', 'd', 'e'].findIndex(item => item === 'c')
// <- 2
['a', 'b', 'c', 'd', 'e'].findIndex((item, i) => i === 0)
// <- 0
['a', 'b', 'c', 'd', 'e'].findIndex(item => item === 'z')
// <- -1

Array#keys

Array#keys returns an iterator that yields a sequence holding the keys for the array. The returned value is an iterator, meaning you can iterate over it with for..of, the spread operator, or by manually calling .next().

['a', 'b', 'c', 'd'].keys()
// <- ArrayIterator {}

Here’s an example using for..of.

for (const key of ['a', 'b', 'c', 'd'].keys()) {
  console.log(key)
  // <- 0
  // <- 1
  // <- 2
  // <- 3
}

Unlike Object.keys, and most methods that iterate over arrays, this sequence doesn’t ignore array holes.

Object.keys(new Array(4))
// <- []
[...new Array(4).keys()]
// <- [0, 1, 2, 3]

Now onto values.

Array#values

Array#values is the same as Array#keys(), but the returned iterator is a sequence of values instead of keys. In practice, you’ll want to iterate over the array itself most of the time, but getting an iterator can come in handy sometimes.

['a', 'b', 'c', 'd'].values()
// <- ArrayIterator {}

You can use for..of or any other methods like a spread operator to pull out the iterable sequence. The following example uses the spread operator on an array’s .values() to create a copy of that array.

[...['a', 'b', 'c', 'd'].values()]
// <- ['a', 'b', 'c', 'd']

Note that omitting the .values() method call would still produce a copy of the array: the sequence is iterated and spread over a new array.

Array#entries

Similar to both preceding methods, except Array#entries returns an iterator with a sequence of key/value pairs.

['a', 'b', 'c', 'd'].entries()
// <- ArrayIterator {}

Each item in the sequence is a two-dimensional array with the key and the value for an item in the array.

[...['a', 'b', 'c', 'd'].entries()]
// <- [[0, 'a'], [1, 'b'], [2, 'c'], [3, 'd']]

Great, one last method left!

Array.prototype[Symbol.iterator]

This is exactly the same as the Array#values method.

const list = ['a', 'b', 'c', 'd']
list[Symbol.iterator] === list.values
// <- true
[...list[Symbol.iterator]()]
// <- ['a', 'b', 'c', 'd']

The following example combines a spread operator, an array, and Symbol.iterator to iterate over its values. Can you follow the code?

[...['a', 'b', 'c', 'd'][Symbol.iterator]()]
// <- ['a', 'b', 'c', 'd']

Let’s break it down. First, there’s the array.

['a', 'b', 'c', 'd']
// <- ['a', 'b', 'c', 'd']

Then we get an iterator.

['a', 'b', 'c', 'd'][Symbol.iterator]()
// <- ArrayIterator {}

Last, we spread the iterator over a new array, creating a copy.

[...['a', 'b', 'c', 'd'][Symbol.iterator]()]
// <- ['a', 'b', 'c', 'd']