joel's programming fun: Using GMP to Approximate π

Okay, now I have

explained the fundamental process of approximating π by numerical methods from the calculus,
talked a little about loss of precision when using bc,
shown a C language program for the approximation and
talked a little about the loss of precision in the C standard libraries.

I guess I can now use the method as a way to show a little about how to use the Gnu Multi-Precision library.

The Gnu Multi-Precision library, GMP:

Here are

If your native language is not English, the wikipedia English wikipedia article has links to wikipedia articles in other languages. It's not much beyond an overview, but it should help a little, and those pages link some non-English resources.

Wish I had time to write this all in Japanese, or even to try to figure out how to annotate what I've written properly.

Once you get used to GMP, you'll probably find MPFR and MPC quite useful, as well. Probably in that order. (MPC builds new functionality onto MPFR which builds new functionality on GMP, if I read the docs correctly.) These add flexibility and methods of dealing with precision issues, and functions for complex numbers.

Note that GMP includes rational arithmetic -- explicit fractions!

Also note that the manual strongly suggests that, for what I'm doing here with multi-precision math, the MPFR library is more complete. If I were attempting a formal discussion of precision, yes, I would go with MPFR. But I'm not, and we need to start somewhere that's not too hard, so I'm starting with GMP.)

First, I converted the program I wrote using C's standard floating point math to the gmp libraries. The result is here:
https://osdn.net/users/reiisi/pastebin/4462

[JMR201705122219:

I should note that the program is not an efficient or fast way to calculate π. It's more of an easy demonstration of overly simplified use of multi-precision arithmetic. And, even there, I don't track the precision very carefully.

]

You can refer to my blog posts above for general explanations of what the program does. And comments in the source code should explain much of the differences between C's standard floating point math and the handling of GMP math, but I should probably add a few notes:

In the comments, I pretty much imply that mpf_t is a pointer. It isn't. Sorry. But having already pasted that into the paste buffer, I'll satisfy myself with an explanation here.

The type is a small structure containing a pointer, as is partially explained in the manual entry on memory management. Either way, you have to initialize it before you use it, as you see in the code. (You do have the code open in a text editor, or in another browser window, don't you?)

C++ should (I think) allow using the multi-precision math almost the same as the standard floating point and integer types. I haven't tried, there may be some tricky stuff in loop syntax or such. I'm not a fan of C++, or of the idea of attempting to fold all program functionality onto existing syntax. This may have something to do with my coming from a FORTH background, where inventing new syntax on-the-fly doesn't interfere that much with the day-to-day workflow. (But it may create design nightmares down the road, and that's-a-different-story-sort-of.)

So using the GMP variables requires resigning yourself to C's function call syntax. But C makes that rather convenient, so it shouldn't be a problem, really.

The way they have set it up here, it requires the initialization to be separated from the declaration. Thus, all the variables of types provided by GMP that need to be accessible in the global scope of the main function are declared there at the top of the main function.

(I believe that C11 or somewhere says we should be able to declare them close to their first use, but I'm sticking with the most common idiom here.)

With all the globals declared, we are free to start ordinary procedural stuff, so the first thing I do is set the precision. The function call provided sets a precision that gives at least the number of bits of precision specified, including rounding. (MPFR allows you to be a little more precise so that you can control the loss of precision yourself.)

mpf_set_default_prec ( 67 )

ensures that we have at least 67 bits of precision. I want this to mimic (sort of) the default precision of bc when you call it as "bc -l" and load in the default math library.

You should want to know how I chose that 67 bits, so I'll tell you.

One binary digit (also called a "bit" -- b~~inary dig~~it) allows you to represent two numbers, usually interpreted as zero and one. That's not enough to represent decimal numbers.

Two bits give you zero to three, which is great for base four. Three bits give you zero to seven which is great for base eight, or octal. Four bits give you zero to fifteen, great for base sixteen, or hexadecimal.

So, if we wanted twenty digits of base sixteen precision, 20 ✕ 4 or eighty bits would be enough. (And we could set it at eighty without causing too many issues, really.)

Since you can also represent zero to nine in four bits, four bits is enough for one digit of binary coded decimal. And twenty binary coded decimal (BCD) would also take eighty bits.

But GMP is not doing BCD. It's doing binary math. Eighty bits of integer binary gives you the ability to represent from 0 to (2⁸⁰ - 1) That's

00000000000000000000000000000000000000000000000000000000000000000000000000000000

11111111111111111111111111111111111111111111111111111111111111111111111111111111

That is easy to convert to hexadecimal. Each group of four binary 0s is one hexadecimal 0, and each group of four binary 1s is one hexadecimal "F". So, in hexadecimal, that's

0x00000000000000000000

0xFFFFFFFFFFFFFFFFFFFF

which probably doesn't look very meaningful to you, unless you play with this stuff for fun like I sometimes do. How can we convert that to decimal numbers that we feel comfortable with? Of course!

me@mymachine:~/games/piArea$ bc -l
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
10^20
100000000000000000000
2^80-1
1208925819614629174706175

so it's from 0 to 1,208,925,819,614,629,174,706,175 (using the US centric notation). So, if we plug in 20 digits of 9s and output that in binary, we can count the number of bits. Uggh. ;-)

obase=2
99999999999999999999
1010110101111000111010111100010110101100011000011111111111111111111

But, wait, there's a better way!

The common logarithm (log base 10, or log₁₀) of a number is strongly related to the number of decimal digits that it takes to write:

log₁₀(10000) == 4

The base two logarithm is strongly related to the number of binary digits it takes to write:

log₂(16) == 4

If only bc could give us a way to calculate log₁₀ and log₂. :)

bc gives us the natural logarithm (ln) in the form of "l()"! And any log_b is related to any other log_c like this:

log_b(x) = log_c(x) / log_c(b)

So, let's set the output base back to ten and define log₂:

obase=10
define l2(x) { return l(x)/l(2); }
l2(16)
4.00000000000000000002
l2(10^20)
66.43856189774724695808

If you're using a reasonably modern bc, you'll probably want to call the function "log2" instead of "l2", since "l2" looks like "12":

define log2(x) { return l(x)/l(2); }
l2(10^20)
66.43856189774724695808

Having four tenths of a bit is kind of hard to do, so let's just say 67 instead of 66.4 bits, okay? And GMP gives us some extra bits and, by default, takes care of 4/5 rounding for us, which is a little more convenient here than what bc does. (We'll expect some difference in results at some point.)

Okay, now that the default precision is set, let's initialize all our variables. (We could have explicitly declared the precision on each initialization, but that would not have saved us the long explanation.)

We should note here that, if this program were more complex, especially if we were using GMP variables as local variables to functions other than main(), we would need to remember to clear them after use in order to avoid memory leaks. (I probably should have cleared them explicitly here to emphasize that.)

I avoid this allocation/deallocation problem here by declaring xsquared, ysquared, y, and deltaA in the outer block of main() rather than in the block local to the for loop, as I had in the native C code. (More on those, later.)

The initialize from string functions are really convenient:

mpf_init_set_str( mpf_t variable, char value[], int base )

mpf_init_set_str(deltaX, "0.025", 10);

initializes deltaX with 1/40 at 67 bits of precision (interpreting 0.025 as a base ten fraction).

When we have the value in one mpf_t variable, we can use that value directly in further initializations, as in

mpf_init_set( x, lastX );

And, if we really don't need to specify a value, just want to set the variables up for later use, we can do that, too:

mpf_init( deltaA );

These variables which I declared without initial values are the ones I'm using in the for loop, by the way.

If there are command line parameters (argc > 1), we have to convert the value for deltaX from the parameter. But the parameter is in a string, so that's straightforward:

mpf_set_str( deltaX, argv[ 1 ], 10 )

It's safer to be sure we got a good conversion when we are using unlimited precision, so I am checking the return value for indication of an error now. In the native C version, I figured there would be no harm in a bad conversion, but it probably would have been better to check.

Moving on to the interesting stuff, the for loop is not impacted as much as we might fear:

for ( ; mpf_cmp( x, limit ) <= 0; mpf_add( x, x, deltaX ) )

The loop counter initialization here was completed explicitly when the variable was initialized, so I left it out of the loop. (It's just not a good idea to leave variables lying around declared but uninitialized. Unless you enjoy working with hidden bugs.)

The comparison expression

mpf_cmp( x, limit ) <= 0

is similar to saying

x <= limit

with standard C floating point types. mpf_cmp( a, b ) returns

an integer value less than zero (negative) if a is less than b,
integer zero if a == b (something we should not expect or depend on),
and an integer greater than zero (positive) if a is greater than b.

Thus, the for loop continues until the value of x exceeds the limit.

The function call

mpf_add( x, x, deltaX )

is essentially the same thing as saying

x = x + deltaX

would be for standard C floating point types (except it does not yield the value assigned as a return value), thus, it means "increment x by deltaX".

The function parameters are listed in the order they would be written in our ordinary (modern infix) algebraic expressions, so, for each of add, mul (multiply), sub (subtract), and div (divide), the parameters are the same:

mpf_sub( mpf_t target, mpf_t left_op, mpf_t right_op )

Target gets the result, and the right op is added to, subtracted from, multiplied with, or divided into the left, as if it were written

target = left_op - right_op

in the case of mpf_sub, and the same with the rest. Again, an important difference is that these functions do not return a result, so we can't nest them and we have to deal with intermediate results ourselves.

Now you know why I have xsquared and deltaA as variables in the GMP version. We need some place to put the intermediate values. Thus

ysquared = 1.0 - ( x * x );

becomes

mpf_mul( xsquared, x, x );
mpf_ui_sub( ysquared, 1L, xsquared );

and

area += deltaX * y;

becomes

mpf_mul( deltaA, deltaX, y );
mpf_add( area, area, deltaA );

I'll repeat myself here -- I did not want xsquared, ysquared, y, and deltaA going out of scope in the loop, so I declared and initialized them in the outer block of main().

If I had a really good reason to keep them local to the loop block, I'd have needed to (re-)initialize them at the top of the loop and clear them at the bottom, to be sure the variables would be valid (and non-segment-faulting) and to keep memory from leaking. That takes enough extra time that I'd prefer to avoid it in a loop like this. And, of course, it gives more opportunity for bugs to creep in.

(In practice, I don't know of a compiler that would deliberately move loop local variables from iteration to iteration of a simple loop like this, but depending on that behavior is depending on something that is not promised in the standard. Not a good idea, and you'll forget to do it right in some places it really will matter.)

Finally, we can celebrate that the GMP provides us with a version of printf that handles gmp variables. Making your own output functions is fun, but sometimes you don't need that kind of fun.

And that's about it. Play with the code and have good kinds of fun!

(I may use this as an excuse to show some ways to parse command line parameters, but, if I do, it will be in another post.)

joel's programming fun

Tuesday, November 29, 2016

Using GMP to Approximate π

The Gnu Multi-Precision library, GMP:

No comments:

Post a Comment