Marc de Bourget
2016-02-11 11:46:21 UTC
I use this version:
http://sourceforge.net/projects/ezwinports/files/gawk-4.1.3-w32-bin.zip/download
Problem: This GAWK for Windows version counts bytes instead of characters.
Céline has 6 characters but 7 bytes due tu the multibyte character "é".
The length function for the string "Céline" should result in 6 but it is 7.
Using gawk for Windows with UTF-8 produces wrong results for at least the
functions length, substr, index, match, split("Céline", CHARS, ""), printf,
sprintf.
Creating a DOS Batch with setting the environment variable LC_ALL doesn't
help:
celine.bat:
SET LC_ALL=en_US.UTF-8
gawk -f celine.awk
Content of celine.awk:
BEGIN {
test = "Céline"
print length(test)
print substr(test,2,1)
print "|" sprintf("%-12.12s", test) "|"
}
http://sourceforge.net/projects/ezwinports/files/gawk-4.1.3-w32-bin.zip/download
Problem: This GAWK for Windows version counts bytes instead of characters.
Céline has 6 characters but 7 bytes due tu the multibyte character "é".
The length function for the string "Céline" should result in 6 but it is 7.
Using gawk for Windows with UTF-8 produces wrong results for at least the
functions length, substr, index, match, split("Céline", CHARS, ""), printf,
sprintf.
Creating a DOS Batch with setting the environment variable LC_ALL doesn't
help:
celine.bat:
SET LC_ALL=en_US.UTF-8
gawk -f celine.awk
Content of celine.awk:
BEGIN {
test = "Céline"
print length(test)
print substr(test,2,1)
print "|" sprintf("%-12.12s", test) "|"
}