1 @node Locales, Searching and Sorting, Extended Characters, Top
2 @chapter Locales and Internationalization
4 Different countries and cultures have varying conventions for how to
5 communicate. These conventions range from very simple ones, such as the
6 format for representing dates and times, to very complex ones, such as
9 @cindex internationalization
11 @dfn{Internationalization} of software means programming it to be able
12 to adapt to the user's favorite conventions. In ANSI C,
13 internationalization works by means of @dfn{locales}. Each locale
14 specifies a collection of conventions, one convention for each purpose.
15 The user chooses a set of conventions by specifying a locale (via
16 environment variables).
18 All programs inherit the chosen locale as part of their environment.
19 Provided the programs are written to obey the choice of locale, they
20 will follow the conventions preferred by the user.
23 * Effects of Locale:: Actions affected by the choice of
25 * Choosing Locale:: How the user specifies a locale.
26 * Locale Categories:: Different purposes for which you can
28 * Setting the Locale:: How a program specifies the locale
29 with library functions.
30 * Standard Locales:: Locale names available on all systems.
31 * Numeric Formatting:: How to format numbers according to the
35 @node Effects of Locale, Choosing Locale, , Locales
36 @section What Effects a Locale Has
38 Each locale specifies conventions for several purposes, including the
43 What multibyte character sequences are valid, and how they are
44 interpreted (@pxref{Extended Characters}).
47 Classification of which characters in the local character set are
48 considered alphabetic, and upper- and lower-case conversion conventions
49 (@pxref{Character Handling}).
52 The collating sequence for the local language and character set
53 (@pxref{Collation Functions}).
56 Formatting of numbers and currency amounts.
59 Formatting of dates and times (@pxref{Formatting Date and Time}).
62 What language to use for output, including error messages.
63 (The C library doesn't yet help you implement this.)
66 What language to use for user answers to yes-or-no questions.
69 What language to use for more complex user input.
70 (The C library doesn't yet help you implement this.)
73 Some aspects of adapting to the specified locale are handled
74 automatically by the library subroutines. For example, all your program
75 needs to do in order to use the collating sequence of the chosen locale
76 is to use @code{strcoll} or @code{strxfrm} to compare strings.
78 Other aspects of locales are beyond the comprehension of the library.
79 For example, the library can't automatically translate your program's
80 output messages into other languages. The only way you can support
81 output in the user's favorite language is to program this more or less
82 by hand. (Eventually, we hope to provide facilities to make this
85 This chapter discusses the mechanism by which you can modify the current
86 locale. The effects of the current locale on specific library functions
87 are discussed in more detail in the descriptions of those functions.
89 @node Choosing Locale, Locale Categories, Effects of Locale, Locales
90 @section Choosing a Locale
92 The simplest way for the user to choose a locale is to set the
93 environment variable @code{LANG}. This specifies a single locale to use
94 for all purposes. For example, a user could specify a hypothetical
95 locale named @samp{espana-castellano} to use the standard conventions of
98 The set of locales supported depends on the operating system you are
99 using, and so do their names. We can't make any promises about what
100 locales will exist, except for one standard locale called @samp{C} or
103 @cindex combining locales
104 A user also has the option of specifying different locales for different
105 purposes---in effect, choosing a mixture of two locales.
107 For example, the user might specify the locale @samp{espana-castellano}
108 for most purposes, but specify the locale @samp{usa-english} for
109 currency formatting. This might make sense if the user is a
110 Spanish-speaking American, working in Spanish, but representing monetary
111 amounts in US dollars.
113 Note that both locales @samp{espana-castellano} and @samp{usa-english},
114 like all locales, would include conventions for all of the purposes to
115 which locales apply. However, the user can choose to use each locale
116 for a particular subset of those purposes.
118 @node Locale Categories, Setting the Locale, Choosing Locale, Locales
119 @section Categories of Activities that Locales Affect
120 @cindex categories for locales
121 @cindex locale categories
123 The purposes that locales serve are grouped into @dfn{categories}, so
124 that a user or a program can choose the locale for each category
125 independently. Here is a table of categories; each name is both an
126 environment variable that a user can set, and a macro name that you can
127 use as an argument to @code{setlocale}.
134 This category applies to collation of strings (functions @code{strcoll}
135 and @code{strxfrm}); see @ref{Collation Functions}.
141 This category applies to classification and conversion of characters;
142 see @ref{Character Handling}.
148 This category applies to formatting monetary values; see @ref{Numeric
155 This category applies to formatting numeric values that are not
156 monetary; see @ref{Numeric Formatting}.
162 This category applies to formatting date and time values; see
163 @ref{Formatting Date and Time}.
165 @ignore This is apparently a feature that was in some early
166 draft of the POSIX.2 standard, but it's not listed in draft 11. Do we
167 still support this anyway? Is there a corresponding environment
174 This category applies to recognizing ``yes'' or ``no'' responses to
182 This is not an environment variable; it is only a macro that you can use
183 with @code{setlocale} to set a single locale for all purposes.
189 If this environment variable is defined, its value specifies the locale
190 to use for all purposes except as overridden by the variables above.
193 @node Setting the Locale, Standard Locales, Locale Categories, Locales
194 @section How Programs Set the Locale
196 A C program inherits its locale environment variables when it starts up.
197 This happens automatically. However, these variables do not
198 automatically control the locale used by the library functions, because
199 ANSI C says that all programs start by default in the standard @samp{C}
200 locale. To use the locales specified by the environment, you must call
201 @code{setlocale}. Call it as follows:
204 setlocale (LC_ALL, "");
208 to select a locale based on the appropriate environment variables.
210 @cindex changing the locale
211 @cindex locale, changing
212 You can also use @code{setlocale} to specify a particular locale, for
213 general use or for a specific category.
216 The symbols in this section are defined in the header file @file{locale.h}.
220 @deftypefun {char *} setlocale (int @var{category}, const char *@var{locale})
221 The function @code{setlocale} sets the current locale for
222 category @var{category} to @var{locale}.
224 If @var{category} is @code{LC_ALL}, this specifies the locale for all
225 purposes. The other possible values of @var{category} specify an
226 individual purpose (@pxref{Locale Categories}).
228 You can also use this function to find out the current locale by passing
229 a null pointer as the @var{locale} argument. In this case,
230 @code{setlocale} returns a string that is the name of the locale
231 currently selected for category @var{category}.
233 The string returned by @code{setlocale} can be overwritten by subsequent
234 calls, so you should make a copy of the string (@pxref{Copying and
235 Concatenation}) if you want to save it past any further calls to
236 @code{setlocale}. (The standard library is guaranteed never to call
237 @code{setlocale} itself.)
239 You should not modify the string returned by @code{setlocale}.
240 It might be the same string that was passed as an argument in a
241 previous call to @code{setlocale}.
243 When you read the current locale for category @code{LC_ALL}, the value
244 encodes the entire combination of selected locales for all categories.
245 In this case, the value is not just a single locale name. In fact, we
246 don't make any promises about what it looks like. But if you specify
247 the same ``locale name'' with @code{LC_ALL} in a subsequent call to
248 @code{setlocale}, it restores the same combination of locale selections.
250 When the @var{locale} argument is not a null pointer, the string returned
251 by @code{setlocale} reflects the newly modified locale.
253 If you specify an empty string for @var{locale}, this means to read the
254 appropriate environment variable and use its value to select the locale
257 If you specify an invalid locale name, @code{setlocale} returns a null
258 pointer and leaves the current locale unchanged.
261 Here is an example showing how you might use @code{setlocale} to
262 temporarily switch to a new locale.
271 with_other_locale (char *new_locale,
272 void (*subroutine) (int),
275 char *old_locale, *saved_locale;
277 /* @r{Get the name of the current locale.} */
278 old_locale = setlocale (LC_ALL, NULL);
280 /* @r{Copy the name so it won't be clobbered by @code{setlocale}.} */
281 saved_locale = strdup (old_locale);
282 if (old_locale == NULL)
283 fatal ("Out of memory");
285 /* @r{Now change the locale and do some stuff with it.} */
286 setlocale (LC_ALL, new_locale);
287 (*subroutine) (argument);
289 /* @r{Restore the original locale.} */
290 setlocale (LC_ALL, saved_locale);
295 @strong{Portability Note:} Some ANSI C systems may define additional
296 locale categories. For portability, assume that any symbol beginning
297 with @samp{LC_} might be defined in @file{locale.h}.
299 @node Standard Locales, Numeric Formatting, Setting the Locale, Locales
300 @section Standard Locales
302 The only locale names you can count on finding on all operating systems
303 are these three standard ones:
307 This is the standard C locale. The attributes and behavior it provides
308 are specified in the ANSI C standard. When your program starts up, it
309 initially uses this locale by default.
312 This is the standard POSIX locale. Currently, it is an alias for the
316 The empty name stands for a site-specific default locale. It's supposed
317 to be a good default for the machine on which the program is running.
320 Defining and installing named locales is normally a responsibility of
321 the system administrator at your site (or the person who installed the
322 GNU C library). Some systems may allow users to create locales, but
323 we don't discuss that here.
324 @c ??? If we give the GNU system that capability, this place will have
325 @c ??? to be changed.
327 If your program needs to use something other than the @samp{C} locale,
328 it will be more portable if you use the whatever locale the user
329 specifies with the environment, rather than trying to specify some
330 non-standard locale explicitly by name. Remember, different machines
331 might have different sets of locales installed.
333 @node Numeric Formatting, , Standard Locales, Locales
334 @section Numeric Formatting
336 When you want to format a number or a currency amount using the
337 conventions of the current locale, you can use the function
338 @code{localeconv} to get the data on how to do it. The function
339 @code{localeconv} is declared in the header file @file{locale.h}.
341 @cindex monetary value formatting
342 @cindex numeric value formatting
346 @deftypefun {struct lconv *} localeconv (void)
347 The @code{localeconv} function returns a pointer to a structure whose
348 components contain information about how numeric and monetary values
349 should be formatted in the current locale.
351 You shouldn't modify the structure or its contents. The structure might
352 be overwritten by subsequent calls to @code{localeconv}, or by calls to
353 @code{setlocale}, but no other function in the library overwrites this
359 @deftp {Data Type} {struct lconv}
360 This is the data type of the value returned by @code{localeconv}.
363 If a member of the structure @code{struct lconv} has type @code{char},
364 and the value is @code{CHAR_MAX}, it means that the current locale has
365 no value for that parameter.
368 * General Numeric:: Parameters for formatting numbers and
370 * Currency Symbol:: How to print the symbol that identifies an
371 amount of money (e.g. @samp{$}).
372 * Sign of Money Amount:: How to print the (positive or negative) sign
373 for a monetary amount, if one exists.
376 @node General Numeric, Currency Symbol, , Numeric Formatting
377 @subsection Generic Numeric Formatting Parameters
379 These are the standard members of @code{struct lconv}; there may be
383 @item char *decimal_point
384 @itemx char *mon_decimal_point
385 These are the decimal-point separators used in formatting non-monetary
386 and monetary quantities, respectively. In the @samp{C} locale, the
387 value of @code{decimal_point} is @code{"."}, and the value of
388 @code{mon_decimal_point} is @code{""}.
389 @cindex decimal-point separator
391 @item char *thousands_sep
392 @itemx char *mon_thousands_sep
393 These are the separators used to delimit groups of digits to the left of
394 the decimal point in formatting non-monetary and monetary quantities,
395 respectively. In the @samp{C} locale, both members have a value of
396 @code{""} (the empty string).
399 @itemx char *mon_grouping
400 These are strings that specify how to group the digits to the left of
401 the decimal point. @code{grouping} applies to non-monetary quantities
402 and @code{mon_grouping} applies to monetary quantities. Use either
403 @code{thousands_sep} or @code{mon_thousands_sep} to separate the digit
405 @cindex grouping of digits
407 Each string is made up of decimal numbers separated by semicolons.
408 Successive numbers (from left to right) give the sizes of successive
409 groups (from right to left, starting at the decimal point). The last
410 number in the string is used over and over for all the remaining groups.
412 If the last integer is @code{-1}, it means that there is no more
413 grouping---or, put another way, any remaining digits form one large
414 group without separators.
416 For example, if @code{grouping} is @code{"4;3;2"}, the number
417 @code{123456787654321} should be grouped into @samp{12}, @samp{34},
418 @samp{56}, @samp{78}, @samp{765}, @samp{4321}. This uses a group of 4
419 digits at the end, preceded by a group of 3 digits, preceded by groups
420 of 2 digits (as many as needed). With a separator of @samp{,}, the
421 number would be printed as @samp{12,34,56,78,765,4321}.
423 A value of @code{"3"} indicates repeated groups of three digits, as
424 normally used in the U.S.
426 In the standard @samp{C} locale, both @code{grouping} and
427 @code{mon_grouping} have a value of @code{""}. This value specifies no
430 @item char int_frac_digits
431 @itemx char frac_digits
432 These are small integers indicating how many fractional digits (to the
433 right of the decimal point) should be displayed in a monetary value in
434 international and local formats, respectively. (Most often, both
435 members have the same value.)
437 In the standard @samp{C} locale, both of these members have the value
438 @code{CHAR_MAX}, meaning ``unspecified''. The ANSI standard doesn't say
439 what to do when you find this the value; we recommend printing no
440 fractional digits. (This locale also specifies the empty string for
441 @code{mon_decimal_point}, so printing any fractional digits would be
445 @node Currency Symbol, Sign of Money Amount, General Numeric, Numeric Formatting
446 @subsection Printing the Currency Symbol
447 @cindex currency symbols
449 These members of the @code{struct lconv} structure specify how to print
450 the symbol to identify a monetary value---the international analog of
451 @samp{$} for US dollars.
453 Each country has two standard currency symbols. The @dfn{local currency
454 symbol} is used commonly within the country, while the
455 @dfn{international currency symbol} is used internationally to refer to
456 that country's currency when it is necessary to indicate the country
459 For example, many countries use the dollar as their monetary unit, and
460 when dealing with international currencies it's important to specify
461 that one is dealing with (say) Canadian dollars instead of U.S. dollars
462 or Australian dollars. But when the context is known to be Canada,
463 there is no need to make this explicit---dollar amounts are implicitly
464 assumed to be in Canadian dollars.
467 @item char *currency_symbol
468 The local currency symbol for the selected locale.
470 In the standard @samp{C} locale, this member has a value of @code{""}
471 (the empty string), meaning ``unspecified''. The ANSI standard doesn't
472 say what to do when you find this value; we recommend you simply print
473 the empty string as you would print any other string found in the
476 @item char *int_curr_symbol
477 The international currency symbol for the selected locale.
479 The value of @code{int_curr_symbol} should normally consist of a
480 three-letter abbreviation determined by the international standard
481 @cite{ISO 4217 Codes for the Representation of Currency and Funds},
482 followed by a one-character separator (often a space).
484 In the standard @samp{C} locale, this member has a value of @code{""}
485 (the empty string), meaning ``unspecified''. We recommend you simply
486 print the empty string as you would print any other string found in the
489 @item char p_cs_precedes
490 @itemx char n_cs_precedes
491 These members are @code{1} if the @code{currency_symbol} string should
492 precede the value of a monetary amount, or @code{0} if the string should
493 follow the value. The @code{p_cs_precedes} member applies to positive
494 amounts (or zero), and the @code{n_cs_precedes} member applies to
497 In the standard @samp{C} locale, both of these members have a value of
498 @code{CHAR_MAX}, meaning ``unspecified''. The ANSI standard doesn't say
499 what to do when you find this value, but we recommend printing the
500 currency symbol before the amount. That's right for most countries.
501 In other words, treat all nonzero values alike in these members.
503 The POSIX standard says that these two members apply to the
504 @code{int_curr_symbol} as well as the @code{currency_symbol}. The ANSI
505 C standard seems to imply that they should apply only to the
506 @code{currency_symbol}---so the @code{int_curr_symbol} should always
509 We can only guess which of these (if either) matches the usual
510 conventions for printing international currency symbols. Our guess is
511 that they should always preceed the amount. If we find out a reliable
512 answer, we will put it here.
514 @item char p_sep_by_space
515 @itemx char n_sep_by_space
516 These members are @code{1} if a space should appear between the
517 @code{currency_symbol} string and the amount, or @code{0} if no space
518 should appear. The @code{p_sep_by_space} member applies to positive
519 amounts (or zero), and the @code{n_sep_by_space} member applies to
522 In the standard @samp{C} locale, both of these members have a value of
523 @code{CHAR_MAX}, meaning ``unspecified''. The ANSI standard doesn't say
524 what you should do when you find this value; we suggest you treat it as
525 one (print a space). In other words, treat all nonzero values alike in
528 These members apply only to @code{currency_symbol}. When you use
529 @code{int_curr_symbol}, you never print an additional space, because
530 @code{int_curr_symbol} itself contains the appropriate separator.
532 The POSIX standard says that these two members apply to the
533 @code{int_curr_symbol} as well as the @code{currency_symbol}. But an
534 example in the ANSI C standard clearly implies that they should apply
535 only to the @code{currency_symbol}---that the @code{int_curr_symbol}
536 contains any appropriate separator, so you should never print an
539 Based on what we know now, we recommend you ignore these members when
540 printing international currency symbols, and print no extra space.
543 @node Sign of Money Amount, , Currency Symbol, Numeric Formatting
544 @subsection Printing the Sign of an Amount of Money
546 These members of the @code{struct lconv} structure specify how to print
547 the sign (if any) in a monetary value.
550 @item char *positive_sign
551 @itemx char *negative_sign
552 These are strings used to indicate positive (or zero) and negative
553 (respectively) monetary quantities.
555 In the standard @samp{C} locale, both of these members have a value of
556 @code{""} (the empty string), meaning ``unspecified''.
558 The ANSI standard doesn't say what to do when you find this value; we
559 recommend printing @code{positive_sign} as you find it, even if it is
560 empty. For a negative value, print @code{negative_sign} as you find it
561 unless both it and @code{positive_sign} are empty, in which case print
562 @samp{-} instead. (Failing to indicate the sign at all seems rather
565 @item char p_sign_posn
566 @itemx char n_sign_posn
567 These members have values that are small integers indicating how to
568 position the sign for nonnegative and negative monetary quantities,
569 respectively. (The string used by the sign is what was specified with
570 @code{positive_sign} or @code{negative_sign}.) The possible values are
575 The currency symbol and quantity should be surrounded by parentheses.
578 Print the sign string before the quantity and currency symbol.
581 Print the sign string after the quantity and currency symbol.
584 Print the sign string right before the currency symbol.
587 Print the sign string right after the currency symbol.
590 ``Unspecified''. Both members have this value in the standard
594 The ANSI standard doesn't say what you should do when the value is
595 @code{CHAR_MAX}. We recommend you print the sign after the currency
599 It is not clear whether you should let these members apply to the
600 international currency format or not. POSIX says you should, but
601 intuition plus the examples in the ANSI C standard suggest you should
602 not. We hope that someone who knows well the conventions for formatting
603 monetary quantities will tell us what we should recommend.