In JavaScript, strings are sequences of characters and are one of the most common data types. Comparing strings is a frequent operation in programming, and it can be used for tasks such as searching, sorting, or validating user input. In this tutorial, we will cover different methods to compare strings in JavaScript, including the comparison operators, the localeCompare()
method, and case-insensitive comparisons.
1. Comparing Strings Using Comparison Operators
In JavaScript, you can compare strings using the standard comparison operators: ==
, !=
, ===
, !==
, <
, >
, <=
, and >=
. When comparing strings with these operators, JavaScript compares the strings’ character codes (Unicode values) in lexicographical order.
Here’s an example:
const str1 = 'apple';
const str2 = 'banana';
const str3 = 'apple';
console.log(str1 == str2); // false
console.log(str1 == str3); // true
console.log(str1 != str2); // true
console.log(str1 < str2); // true
console.log(str1 > str2); // false
console.log(str1 <= str3); // true
console.log(str1 >= str3); // true
Note that using ==
and !=
can result in unintended behavior due to type coercion. To avoid this, it is recommended to use strict equality (===
) and strict inequality (!==
) operators.
2. Unicode strings and string comparison in Javascript
In JavaScript, strings are represented using the UTF-16 encoding of Unicode characters. Unicode is a standard that assigns unique code points to characters from various languages, symbols, and scripts, ensuring consistent representation across different platforms and systems.
UTF-16 (16-bit Unicode Transformation Format) is a variable-length character encoding, which represents Unicode characters using either one or two 16-bit code units (2 or 4 bytes). The first 65,536 Unicode code points (from U+0000 to U+FFFF) are represented using a single 16-bit code unit, while the remaining characters (from U+10000 to U+10FFFF) are represented using two 16-bit code units, known as a surrogate pair.
When you use JavaScript string methods and properties, such as length, charAt(), or charCodeAt(), they operate on the 16-bit code units rather than the full Unicode characters. This means that if you have a string containing a character represented by a surrogate pair, the length property will return a value that is greater by one than you might expect, and charAt() or charCodeAt() will return the individual surrogate code units.
For Example:
const str = '𠈓A';
console.log(str.length); // 3, not 2
console.log(str.charAt(0)); // '\uD842', not '𠈓'
console.log(str.charAt(1)); // '\uDE13', not 'A'
console.log(str.charAt(2)); // 'A'
console.log(str.charCodeAt(0)); // 55362, not 131299
console.log(str.charCodeAt(1)); // 56851
console.log(str.charCodeAt(2)); // 65
To properly handle Unicode strings in JavaScript, you can use the String.fromCodePoint()
and String.prototype.codePointAt()
methods, which were introduced in ECMAScript 6 (ES2015) to work with Unicode code points directly:
const str = '𠈓A';
console.log(str.codePointAt(0)); // 131299
console.log(str.codePointAt(1)); // 56851, not 'A' because it's the second part of the surrogate pair
console.log(str.codePointAt(2)); // 65
console.log(String.fromCodePoint(131299)); // '𠈓'
console.log(String.fromCodePoint(65)); // 'A'
3. Using the localeCompare()
Method
The localeCompare()
method compares two strings based on their Unicode values and returns an integer indicating the order of the strings in the current locale.
Example:
string1.localeCompare(string2);
Return values:
- If
string1
comes beforestring2
, a negative number is returned. - If
string1
is equal tostring2
, 0 is returned. - If
string1
comes afterstring2
, a positive number is returned.
Example:
const str1 = 'apple';
const str2 = 'banana';
const str3 = 'apple';
console.log(str1.localeCompare(str2)); // -1
console.log(str1.localeCompare(str3)); // 0
console.log(str2.localeCompare(str1)); // 1
Comparison operators and localeCompare()
are two different ways to compare strings in JavaScript, each with its own use cases and characteristics. Here’s an overview of the differences between the two:
Comparison Operators: (<
, >
, <=
, >=
, ==
, !=
, ===
, !==
)
1. Character code-based comparison:
Comparison operators compare strings based on the Unicode values of their characters. They perform a lexicographic comparison, comparing character codes one by one from the beginning of the strings.
2. Binary comparison:
This method does not take into account any language-specific rules, such as collation order or case folding. It is purely based on the binary Unicode values of the characters.
3. Type coercion:
Using ==
and !=
can lead to unintended behavior due to type coercion. It is recommended to use strict equality (===) and strict inequality (!==) operators to avoid these issues.
4. Performance:
Comparison operators are usually faster than localeCompare() since they perform a simple binary comparison without considering any localization rules.
localeCompare()
1. Language-aware comparison:
The localeCompare()
method compares strings based on their Unicode values while taking into account the current locale’s collation rules. This method provides a more accurate comparison for strings in different languages, considering language-specific sorting rules, diacritics, and case folding.
Customizable comparison:
You can pass additional options to localeCompare() to customize the comparison, such as sensitivity (accent, case, or variant), or numeric sorting (for strings containing numbers). This allows for more fine-grained control over the comparison process.
Return value:
Unlike comparison operators that return a boolean result, localeCompare()
returns an integer indicating the order of the strings (-1, 0, or 1). This makes it particularly useful when sorting arrays of strings with the sort()
method.
Performance:
localeCompare() is generally slower than comparison operators since it needs to consider localization rules and additional options.
3. Case-Insensitive Comparisons
When comparing strings, it’s often necessary to perform a case-insensitive comparison. You can do this by converting both strings to either uppercase or lowercase using the toUpperCase()
or toLowerCase()
methods before performing the comparison.
Example:
const str1 = 'Apple';
const str2 = 'apple';
// Case-insensitive comparison using toLowerCase()
console.log(str1.toLowerCase() === str2.toLowerCase()); // true
// Case-insensitive comparison using toUpperCase()
console.log(str1.toUpperCase() === str2.toUpperCase()); // true
// Case-insensitive localeCompare
console.log(str1.toLowerCase().localeCompare(str2.toLowerCase())); // 0
Case insensitive comparison using localCompare()
You can perform a case-insensitive string comparison using the localeCompare()
function by providing an options object with the sensitivity property set to 'base'
. This configuration ensures that the comparison ignores differences in case and accents.
Here’s an example:
const str1 = 'Apple';
const str2 = 'apple';
const options = { sensitivity: 'base' };
const result = str1.localeCompare(str2, undefined, options);
console.log(result); // 0, indicating the strings are equal when ignoring case and accents
In this example, str1
and str2
are considered equal because the comparison is case-insensitive. The undefined
parameter represents the locale; when set to undefined
, the browser’s default locale will be used. The options object specifies the comparison’s sensitivity settings.
Note that you can also use the caseFirst
option to control the order of upper and lower case characters:
const str1 = 'apple';
const str2 = 'Apple';
const options = { sensitivity: 'base', caseFirst: 'upper' };
const result = str1.localeCompare(str2, undefined, options);
console.log(result); // 1, since 'Apple' should come before 'apple' when uppercase comes first
In this example, we set the caseFirst
property to 'upper'
, which indicates that uppercase letters should come before lowercase letters. As a result, the comparison returns 1, meaning str1 comes after str2 when considering the case order.
Other options of localCompare()
The options parameter in the localeCompare()
function is an object that allows you to customize the behavior of the comparison. You can provide different properties in the options object to control various aspects of the comparison. Some of the most commonly used properties are:
- sensitivity: Controls the sensitivity of the comparison. It can have one of the following values:
- ‘base’: Only distinguishes strings based on their base characters, ignoring case and diacritics.
- ‘accent’: Distinguishes strings based on their base characters and accents, but ignores case.
- ‘case’: Distinguishes strings based on their base characters and case, but ignores diacritics.
- ‘variant’: Distinguishes strings based on their base characters, accents, and case.
ignorePunctuation
: A boolean value that, when set to true, causes the comparison to ignore punctuation marks and whitespace at the beginning and end of the compared strings.numeric
: A boolean value that, when set totrue
, enables numeric sorting, which means that substrings of digits are sorted according to their numeric values instead of their Unicode code points. For example,'10'
would come after'2'
when numeric sorting is enabled.caseFirst
: Specifies whether uppercase or lowercase characters should come first in the sorting order. It can have one of the following values:
'upper'
: Uppercase characters come before lowercase characters.'lower'
: Lowercase characters come before uppercase characters.'false'
: (default) Case order depends on the current locale’s rules.
Here’s an example of using the localeCompare()
function with multiple options:
const str1 = 'apple1';
const str2 = 'Apple10';
const options = {
sensitivity: 'base',
numeric: true,
caseFirst: 'upper',
ignorePunctuation: true
};
const result = str1.localeCompare(str2, undefined, options);
console.log(result); // -1, indicating 'apple1' comes before 'Apple10' when considering the specified options
In this example, we set the sensitivity property to 'base'
for a case-insensitive and accent-insensitive comparison, the numeric
property to true
for numeric sorting, the caseFirst
property to 'upper'
to put uppercase letters before lowercase letters, and the ignorePunctuation
property to true
to ignore punctuation and whitespace at the beginning and end of the strings.
Sorting an array of unicode strings with the help of localCompare()
function
Here’s an example of a function that sorts an array of Unicode strings using the localeCompare()
function:
function sortUnicodeStrings(strings, locale = undefined, options = {}) {
return strings.sort((a, b) => a.localeCompare(b, locale, options));
}
const unsortedArray = ['apple', 'Banana', 'grape', 'árbol', 'Zorro', 'mañana'];
const sortedArray = sortUnicodeStrings(unsortedArray);
console.log(sortedArray);
// Output: [ 'apple', 'árbol', 'Banana', 'grape', 'mañana', 'Zorro' ]
In this example, we define a sortUnicodeStrings
function that takes an array of strings, an optional locale
parameter (defaulting to undefined
), and an optional options
object for customization. The function uses the Array.prototype.sort()
method and passes a compare function that leverages localeCompare()
to perform the string comparison.
The unsortedArray
contains a mix of English and Spanish words with different letter cases. When we pass this array to sortUnicodeStrings
, it returns a sorted array based on the locale-aware comparison rules.
You can also pass the locale
and options
arguments to the function to further customize the sorting behavior:
const locale = 'es'; // Use the Spanish locale
const options = { sensitivity: 'base', caseFirst: 'upper' };
const sortedArrayCustom = sortUnicodeStrings(unsortedArray, locale, options);
console.log(sortedArrayCustom);
// Output: [ 'árbol', 'apple', 'Banana', 'grape', 'mañana', 'Zorro' ]
In this example, we use the Spanish locale for sorting and specify the sensitivity
and caseFirst
options for a case-insensitive, accent-insensitive comparison with uppercase letters coming first.
In this tutorial, we discussed various methods to compare strings in JavaScript, including comparison operators, the localeCompare()
method, and case-insensitive comparisons. Understanding these techniques is useful when working with strings, as they allow you to perform tasks like sorting, searching, and input validation.