[bug-gawk] How does awk implement extremely long integers?

Discussion:

Peng Yu

2012-01-31 15:03:05 UTC

Hi,

I don't see where the limit of an integer in gawk is documented. But I
tried the following code on my MacBook Pro (64bit) with macports
installation. It can store integer as large as 2^1023. (I suspect the
maximum should be something like 2^1024-1, but I never test it). I'm
wondering how gawk store and compute some big integer, as an integer
of this large is beyond any native type supported my the machine.

If some library is used to support long integers, why not use one that
supports infinitely large integer?

~$ seq 1023|awk 'BEGIN{i=1}{i*=2}END{print i}'
89884656743115795386465259539451236680898848947115328636715040578866337902750481566354238661203768010560056939935696678829394884407208311246423715319737062188883946712432742638151109800623047059726541476042502884419075341171231440736956555270413618581675255342293149119973622969239858152417678164812112068608

~$ seq 1024|awk 'BEGIN{i=1}{i*=2}END{print i+2}'
inf

--
Regards,
Peng

Nelson H. F. Beebe

2012-01-31 17:53:30 UTC

Permalink

Peng Yu <***@gmail.com> asks about the size of an integer
in gawk.

All gawk implementations (gawk, nawk, awk, mawk, tawk, jawk, ...) by
default use a numeric type implemented as a double, which on modern
systems is always the IEEE 754 64-bit format. It has a 53-bit
significand, with a separate sign, so numbers in the range
[0, 2**53 - 1] can be represented exactly.

For the applications for which awk was intended, that is mostly adequate.
I have private versions of mawk and nawk that have been extended for
80-bit and 128-bit long double IEEE 754 formats, and also for 128-bit
IEEE 754-2008 decimal arithmetic. See

http://www.math.utah.edu/pub/mathcw/

if you want to experiment with extended arithmetic in awk.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: ***@math.utah.edu -
- 155 S 1400 E RM 233 ***@acm.org ***@computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------

John Haque

2012-02-01 15:14:47 UTC

Permalink

Hi.

Post by Nelson H. F. Beebe
All gawk implementations (gawk, nawk, awk, mawk, tawk, jawk, ...) by
default use a numeric type implemented as a double, which on modern
systems is always the IEEE 754 64-bit format. It has a 53-bit
significand, with a separate sign, so numbers in the range
[0, 2**53 - 1] can be represented exactly.
For the applications for which awk was intended, that is mostly adequate.
I have private versions of mawk and nawk that have been extended for
80-bit and 128-bit long double IEEE 754 formats, and also for 128-bit
IEEE 754-2008 decimal arithmetic. See
http://www.math.utah.edu/pub/mathcw/

I assume the modified gcc has additional format specifiers to print
"long" integers?

The GNU mpfr library does not have any built-in facility
to output "long" integers what I tell after looking at the online
doc. BTW, if I am not mistaken, gawk used to print long integers as
floating point numbers in the old days. Is this going to be an issue
for gawk?

$ gawk --version
GNU Awk 3.1.5

$ gawk 'BEGIN { print 1000*342413245}'
3.42413e+11

Thanks,

John

Davide Brini

2012-01-31 18:10:24 UTC

Permalink

Post by Peng Yu
Hi,
I don't see where the limit of an integer in gawk is documented. But I
tried the following code on my MacBook Pro (64bit) with macports
installation. It can store integer as large as 2^1023. (I suspect the
maximum should be something like 2^1024-1, but I never test it). I'm
wondering how gawk store and compute some big integer, as an integer
of this large is beyond any native type supported my the machine.
If some library is used to support long integers, why not use one that
supports infinitely large integer?
~$ seq 1023|awk 'BEGIN{i=1}{i*=2}END{print i}'
89884656743115795386465259539451236680898848947115328636715040578866337902750481566354238661203768010560056939935696678829394884407208311246423715319737062188883946712432742638151109800623047059726541476042502884419075341171231440736956555270413618581675255342293149119973622969239858152417678164812112068608
~$ seq 1024|awk 'BEGIN{i=1}{i*=2}END{print i+2}'
inf

Note that there are well-known algorithms to implement integers of
arbitrary size, regardless of what the underlying architecture supports
(google for "bignum"). However, in the case of gawk the reason is that gawk
internally uses floating point (ie, probably C's "double" type) to
represent all kinds of numeric data, including integers.
There are some well-known issues with that, the main one being loss of
precision (for example, in your code above, try printing i and i+1 at each
iteration, and you'll see that, at least from a certain point onwards, awk
thinks they're the same. On a system I can access now, that happens
somewhere between 2^52 and 2^53).

For the details see the gawk manual:

http://www.gnu.org/software/gawk/manual/gawk.html#Floating-Point-Issues

--
D.

Aharon Robbins

2012-01-31 18:50:42 UTC

Permalink

Hi.

Date: Tue, 31 Jan 2012 09:03:05 -0600
Subject: [bug-gawk] How does awk implement extremely long integers?
Hi,
I don't see where the limit of an integer in gawk is documented. But I
tried the following code on my MacBook Pro (64bit) with macports
installation. It can store integer as large as 2^1023. (I suspect the
maximum should be something like 2^1024-1, but I never test it). I'm
wondering how gawk store and compute some big integer, as an integer
of this large is beyond any native type supported my the machine.

As others have mentioned, awk uses double precision floating point.
Some of the issues are documented in the gawk manual.

If some library is used to support long integers, why not use one that
supports infinitely large integer?

Except that it doesn't use such a library. I would like to, at some
point, but it's definitely major surgery to do so.

Thanks,

Arnold

a***@skeeve.com

2012-02-01 17:27:42 UTC

Permalink

Hi.

Date: Wed, 1 Feb 2012 09:14:47 -0600
Subject: Re: [bug-gawk] How does awk implement extremely long integers?
Hi.

I assume the modified gcc has additional format specifiers to print
"long" integers?

I think so.

The GNU mpfr library does not have any built-in facility
to output "long" integers what I tell after looking at the online
doc.

Understanding how to support printf formats for MPFR is just one of the
challenges to adding support for it.

BTW, if I am not mistaken, gawk used to print long integers as
floating point numbers in the old days. Is this going to be an issue
for gawk?
$ gawk --version
GNU Awk 3.1.5
$ gawk 'BEGIN { print 1000*342413245}'
3.42413e+11

Yes, in the Olden Days, if a value was out of range for an integer,
gawk fell back to %g. This was better than what other awks did, which
was to print MAX_INT.

At some point (undoubtedly it can be found in the ChangeLog), I moved
to using printf %.0f which prints values as full integers. This follows
a more literal interpretation of the standard, which says that "print"
prints integral values as integers, without saying anything about ranges.

Arnold

John Haque

2012-02-01 17:48:32 UTC

Permalink

Post by a***@skeeve.com

Post by John Haque
I assume the modified gcc has additional format specifiers to print
"long" integers?

I think so.

Post by John Haque
The GNU mpfr library does not have any built-in facility
to output "long" integers what I tell after looking at the online
doc.

Understanding how to support printf formats for MPFR is just one of the
challenges to adding support for it.

One of many more like hashing for arrays. Have to make sure these
are treated as strings and not numbers. It is highly unlikely that
there will be any benifit of special array types speedwise.

Post by a***@skeeve.com

Post by John Haque
BTW, if I am not mistaken, gawk used to print long integers as
floating point numbers in the old days. Is this going to be an issue
for gawk?

Yes, in the Olden Days, if a value was out of range for an integer,
gawk fell back to %g.

And loss of precision.

My google-fu is week today. What is the relationship between gmp and
mpfr libraries. None of these library docs are very explicit about this, and
I found statements which contradicts each other. Also, seems
like other languages have bindings for both.
The gmp library has formatting routines for "long" integers,
but the rest of the stuff looks same as mpfr ???

Thanks,

John

Nelson H. F. Beebe

2012-02-02 15:35:51 UTC

Permalink

Post by John Haque
I assume the modified gcc has additional format specifiers to print
"long" integers?

It is not the compiler, but rather the library, that contains input
and output support. There are extensions to the printf() and
scanf() families in -lmcw that support 80-bit and 128-bit binary
floating point, and 32-bit, 64-bit, and 128-bit decimal floating
point. The integer support is as with C99: int, long int, and
long long int. There are no arbitrary-precision integers: that
is what -lgmp and -lmpfr are for.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: ***@math.utah.edu -
- 155 S 1400 E RM 233 ***@acm.org ***@computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------

Nelson H. F. Beebe

2012-02-02 15:38:08 UTC

Permalink

What is the relationship between gmp and mpfr libraries.

GMP provides the basic multiple-precision arithmetic. MPFR builds
on that to provide correctly-rounded results. MPC builds on both
to provide complex arithmetic. See

http://www.mpfr.org/

-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: ***@math.utah.edu -
- 155 S 1400 E RM 233 ***@acm.org ***@computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------