github.com/aergoio/aergo@v1.3.1/libtool/src/gmp-6.1.2/doc/gmp.texi

github.com/aergoio/aergo@v1.3.1/libtool/src/gmp-6.1.2/doc/gmp.texi (about)

     1  \input texinfo    @c -*-texinfo-*-
     2  @c %**start of header
     3  @setfilename gmp.info
     4  @documentencoding ISO-8859-1
     5  @include version.texi
     6  @settitle GNU MP @value{VERSION}
     7  @synindex tp fn
     8  @iftex
     9  @afourpaper
    10  @end iftex
    11  @comment %**end of header
    12  
    13  @copying
    14  This manual describes how to install and use the GNU multiple precision
    15  arithmetic library, version @value{VERSION}.
    16  
    17  Copyright 1991, 1993-2016 Free Software Foundation, Inc.
    18  
    19  Permission is granted to copy, distribute and/or modify this document under
    20  the terms of the GNU Free Documentation License, Version 1.3 or any later
    21  version published by the Free Software Foundation; with no Invariant Sections,
    22  with the Front-Cover Texts being ``A GNU Manual'', and with the Back-Cover
    23  Texts being ``You have freedom to copy and modify this GNU Manual, like GNU
    24  software''.  A copy of the license is included in
    25  @ref{GNU Free Documentation License}.
    26  @end copying
    27  @c  Note the @ref above must be on one line, a line break in an @ref within
    28  @c  @copying will bomb in recent texinfo.tex (eg. 2004-04-07.08 which comes
    29  @c  with texinfo 4.7), with messages about missing @endcsname.
    30  
    31  
    32  @c  Texinfo version 4.2 or up will be needed to process this file.
    33  @c
    34  @c  The version number and edition number are taken from version.texi provided
    35  @c  by automake (note that it's regenerated only if you configure with
    36  @c  --enable-maintainer-mode).
    37  @c
    38  @c  Notes discussing the present version number of GMP in relation to previous
    39  @c  ones (for instance in the "Compatibility" section) must be updated at
    40  @c  manually though.
    41  @c
    42  @c  @cindex entries have been made for function categories and programming
    43  @c  topics.  The "mpn" section is not included in this, because a beginner
    44  @c  looking for "GCD" or something is only going to be confused by pointers to
    45  @c  low level routines.
    46  @c
    47  @c  @cindex entries are present for processors and systems when there's
    48  @c  particular notes concerning them, but not just for everything GMP
    49  @c  supports.
    50  @c
    51  @c  Index entries for files use @code rather than @file, @samp or @option,
    52  @c  since the latter come out with quotes in TeX, which are nice in the text
    53  @c  but don't look so good in index columns.
    54  @c
    55  @c  Tex:
    56  @c
    57  @c  A suitable texinfo.tex is supplied, a newer one should work equally well.
    58  @c
    59  @c  HTML:
    60  @c
    61  @c  Nothing special is done for links to external manuals, they just come out
    62  @c  in the usual makeinfo style, eg. "../libc/Locales.html".  If you have
    63  @c  local copies of such manuals then this is a good thing, if not then you
    64  @c  may want to search-and-replace to some online source.
    65  @c
    66  
    67  @dircategory GNU libraries
    68  @direntry
    69  * gmp: (gmp).                   GNU Multiple Precision Arithmetic Library.
    70  @end direntry
    71  
    72  @c  html <meta name="description" content="...">
    73  @documentdescription
    74  How to install and use the GNU multiple precision arithmetic library, version @value{VERSION}.
    75  @end documentdescription
    76  
    77  @c smallbook
    78  @finalout
    79  @setchapternewpage on
    80  
    81  @ifnottex
    82  @node Top, Copying, (dir), (dir)
    83  @top GNU MP
    84  @end ifnottex
    85  
    86  @iftex
    87  @titlepage
    88  @title GNU MP
    89  @subtitle The GNU Multiple Precision Arithmetic Library
    90  @subtitle Edition @value{EDITION}
    91  @subtitle @value{UPDATED}
    92  
    93  @author by Torbj@"orn Granlund and the GMP development team
    94  @c @email{tg@@gmplib.org}
    95  
    96  @c Include the Distribution inside the titlepage so
    97  @c that headings are turned off.
    98  
    99  @tex
   100  \global\parindent=0pt
   101  \global\parskip=8pt
   102  \global\baselineskip=13pt
   103  @end tex
   104  
   105  @page
   106  @vskip 0pt plus 1filll
   107  @end iftex
   108  
   109  @insertcopying
   110  @ifnottex
   111  @sp 1
   112  @end ifnottex
   113  
   114  @iftex
   115  @end titlepage
   116  @headings double
   117  @end iftex
   118  
   119  @c  Don't bother with contents for html, the menus seem adequate.
   120  @ifnothtml
   121  @contents
   122  @end ifnothtml
   123  
   124  @menu
   125  * Copying::                    GMP Copying Conditions (LGPL).
   126  * Introduction to GMP::        Brief introduction to GNU MP.
   127  * Installing GMP::             How to configure and compile the GMP library.
   128  * GMP Basics::                 What every GMP user should know.
   129  * Reporting Bugs::             How to usefully report bugs.
   130  * Integer Functions::          Functions for arithmetic on signed integers.
   131  * Rational Number Functions::  Functions for arithmetic on rational numbers.
   132  * Floating-point Functions::   Functions for arithmetic on floats.
   133  * Low-level Functions::        Fast functions for natural numbers.
   134  * Random Number Functions::    Functions for generating random numbers.
   135  * Formatted Output::           @code{printf} style output.
   136  * Formatted Input::            @code{scanf} style input.
   137  * C++ Class Interface::        Class wrappers around GMP types.
   138  * Custom Allocation::          How to customize the internal allocation.
   139  * Language Bindings::          Using GMP from other languages.
   140  * Algorithms::                 What happens behind the scenes.
   141  * Internals::                  How values are represented behind the scenes.
   142  
   143  * Contributors::               Who brings you this library?
   144  * References::                 Some useful papers and books to read.
   145  * GNU Free Documentation License::
   146  * Concept Index::
   147  * Function Index::
   148  @end menu
   149  
   150  
   151  @c  @m{T,N} is $T$ in tex or @math{N} otherwise.  This is an easy way to give
   152  @c  different forms for math in tex and info.  Commas in N or T don't work,
   153  @c  but @C{} can be used instead.  \, works in info but not in tex.
   154  @iftex
   155  @macro m {T,N}
   156  @tex$\T\$@end tex
   157  @end macro
   158  @end iftex
   159  @ifnottex
   160  @macro m {T,N}
   161  @math{\N\}
   162  @end macro
   163  @end ifnottex
   164  
   165  @macro C {}
   166  ,
   167  @end macro
   168  
   169  @c  @ms{V,N} is $V_N$ in tex or just vn otherwise.  This suits simple
   170  @c  subscripts like @ms{x,0}.
   171  @iftex
   172  @macro ms {V,N}
   173  @tex$\V\_{\N\}$@end tex
   174  @end macro
   175  @end iftex
   176  @ifnottex
   177  @macro ms {V,N}
   178  \V\\N\
   179  @end macro
   180  @end ifnottex
   181  
   182  @c  @nicode{S} is plain S in info, or @code{S} elsewhere.  This can be used
   183  @c  when the quotes that @code{} gives in info aren't wanted, but the
   184  @c  fontification in tex or html is wanted.  Doesn't work as @nicode{'\\0'}
   185  @c  though (gives two backslashes in tex).
   186  @ifinfo
   187  @macro nicode {S}
   188  \S\
   189  @end macro
   190  @end ifinfo
   191  @ifnotinfo
   192  @macro nicode {S}
   193  @code{\S\}
   194  @end macro
   195  @end ifnotinfo
   196  
   197  @c  @nisamp{S} is plain S in info, or @samp{S} elsewhere.  This can be used
   198  @c  when the quotes that @samp{} gives in info aren't wanted, but the
   199  @c  fontification in tex or html is wanted.
   200  @ifinfo
   201  @macro nisamp {S}
   202  \S\
   203  @end macro
   204  @end ifinfo
   205  @ifnotinfo
   206  @macro nisamp {S}
   207  @samp{\S\}
   208  @end macro
   209  @end ifnotinfo
   210  
   211  @c  Usage: @GMPtimes{}
   212  @c  Give either \times or the word "times".
   213  @tex
   214  \gdef\GMPtimes{\times}
   215  @end tex
   216  @ifnottex
   217  @macro GMPtimes
   218  times
   219  @end macro
   220  @end ifnottex
   221  
   222  @c  Usage: @GMPmultiply{}
   223  @c  Give * in info, or nothing in tex.
   224  @tex
   225  \gdef\GMPmultiply{}
   226  @end tex
   227  @ifnottex
   228  @macro GMPmultiply
   229  *
   230  @end macro
   231  @end ifnottex
   232  
   233  @c  Usage: @GMPabs{x}
   234  @c  Give either |x| in tex, or abs(x) in info or html.
   235  @tex
   236  \gdef\GMPabs#1{|#1|}
   237  @end tex
   238  @ifnottex
   239  @macro GMPabs {X}
   240  @abs{}(\X\)
   241  @end macro
   242  @end ifnottex
   243  
   244  @c  Usage: @GMPfloor{x}
   245  @c  Give either \lfloor x\rfloor in tex, or floor(x) in info or html.
   246  @tex
   247  \gdef\GMPfloor#1{\lfloor #1\rfloor}
   248  @end tex
   249  @ifnottex
   250  @macro GMPfloor {X}
   251  floor(\X\)
   252  @end macro
   253  @end ifnottex
   254  
   255  @c  Usage: @GMPceil{x}
   256  @c  Give either \lceil x\rceil in tex, or ceil(x) in info or html.
   257  @tex
   258  \gdef\GMPceil#1{\lceil #1 \rceil}
   259  @end tex
   260  @ifnottex
   261  @macro GMPceil {X}
   262  ceil(\X\)
   263  @end macro
   264  @end ifnottex
   265  
   266  @c  Math operators already available in tex, made available in info too.
   267  @c  For example @bmod{} can be used in both tex and info.
   268  @ifnottex
   269  @macro bmod
   270  mod
   271  @end macro
   272  @macro gcd
   273  gcd
   274  @end macro
   275  @macro ge
   276  >=
   277  @end macro
   278  @macro le
   279  <=
   280  @end macro
   281  @macro log
   282  log
   283  @end macro
   284  @macro min
   285  min
   286  @end macro
   287  @macro leftarrow
   288  <-
   289  @end macro
   290  @macro rightarrow
   291  ->
   292  @end macro
   293  @end ifnottex
   294  
   295  @c  New math operators.
   296  @c  @abs{} can be used in both tex and info, or just \abs in tex.
   297  @tex
   298  \gdef\abs{\mathop{\rm abs}}
   299  @end tex
   300  @ifnottex
   301  @macro abs
   302  abs
   303  @end macro
   304  @end ifnottex
   305  
   306  @c  @cross{} is a \times symbol in tex, or an "x" in info.  In tex it works
   307  @c  inside or outside $ $.
   308  @tex
   309  \gdef\cross{\ifmmode\times\else$\times$\fi}
   310  @end tex
   311  @ifnottex
   312  @macro cross
   313  x
   314  @end macro
   315  @end ifnottex
   316  
   317  @c  @times{} made available as a "*" in info and html (already works in tex).
   318  @ifnottex
   319  @macro times
   320  *
   321  @end macro
   322  @end ifnottex
   323  
   324  @c  Usage: @W{text}
   325  @c  Like @w{} but working in math mode too.
   326  @tex
   327  \gdef\W#1{\ifmmode{#1}\else\w{#1}\fi}
   328  @end tex
   329  @ifnottex
   330  @macro W {S}
   331  @w{\S\}
   332  @end macro
   333  @end ifnottex
   334  
   335  @c  Usage: \GMPdisplay{text}
   336  @c  Put the given text in an @display style indent, but without turning off
   337  @c  paragraph reflow etc.
   338  @tex
   339  \gdef\GMPdisplay#1{%
   340  \noindent
   341  \advance\leftskip by \lispnarrowing
   342  #1\par}
   343  @end tex
   344  
   345  @c  Usage: \GMPhat
   346  @c  A new \hat that will work in math mode, unlike the texinfo redefined
   347  @c  version.
   348  @tex
   349  \gdef\GMPhat{\mathaccent"705E}
   350  @end tex
   351  
   352  @c  Usage: \GMPraise{text}
   353  @c  For use in a $ $ math expression as an alternative to "^".  This is good
   354  @c  for @code{} in an exponent, since there seems to be no superscript font
   355  @c  for that.
   356  @tex
   357  \gdef\GMPraise#1{\mskip0.5\thinmuskip\hbox{\raise0.8ex\hbox{#1}}}
   358  @end tex
   359  
   360  @c  Usage: @texlinebreak{}
   361  @c  A line break as per @*, but only in tex.
   362  @iftex
   363  @macro texlinebreak
   364  @*
   365  @end macro
   366  @end iftex
   367  @ifnottex
   368  @macro texlinebreak
   369  @end macro
   370  @end ifnottex
   371  
   372  @c  Usage: @maybepagebreak
   373  @c  Allow tex to insert a page break, if it feels the urge.
   374  @c  Normally blocks of @deftypefun/funx are kept together, which can lead to
   375  @c  some poor page break positioning if it's a big block, like the sets of
   376  @c  division functions etc.
   377  @tex
   378  \gdef\maybepagebreak{\penalty0}
   379  @end tex
   380  @ifnottex
   381  @macro maybepagebreak
   382  @end macro
   383  @end ifnottex
   384  
   385  @c  Usage: @GMPreftop{info,title}
   386  @c  Usage: @GMPpxreftop{info,title}
   387  @c
   388  @c  Like @ref{} and @pxref{}, but designed for a reference to the top of a
   389  @c  document, not a particular section.  The TeX output for plain @ref insists
   390  @c  on printing a particular section, GMPreftop gives just the title.
   391  @c
   392  @c  The texinfo manual recommends putting a likely section name in references
   393  @c  like this, eg. "Introduction", but it seems better to just give the title.
   394  @c
   395  @iftex
   396  @macro GMPreftop{info,title}
   397  @i{\title\}
   398  @end macro
   399  @macro GMPpxreftop{info,title}
   400  see @i{\title\}
   401  @end macro
   402  @end iftex
   403  @c
   404  @ifnottex
   405  @macro GMPreftop{info,title}
   406  @ref{Top,\title\,\title\,\info\,\title\}
   407  @end macro
   408  @macro GMPpxreftop{info,title}
   409  @pxref{Top,\title\,\title\,\info\,\title\}
   410  @end macro
   411  @end ifnottex
   412  
   413  
   414  @node Copying, Introduction to GMP, Top, Top
   415  @comment  node-name, next, previous,  up
   416  @unnumbered GNU MP Copying Conditions
   417  @cindex Copying conditions
   418  @cindex Conditions for copying GNU MP
   419  @cindex License conditions
   420  
   421  This library is @dfn{free}; this means that everyone is free to use it and
   422  free to redistribute it on a free basis.  The library is not in the public
   423  domain; it is copyrighted and there are restrictions on its distribution, but
   424  these restrictions are designed to permit everything that a good cooperating
   425  citizen would want to do.  What is not allowed is to try to prevent others
   426  from further sharing any version of this library that they might get from
   427  you.@refill
   428  
   429  Specifically, we want to make sure that you have the right to give away copies
   430  of the library, that you receive source code or else can get it if you want
   431  it, that you can change this library or use pieces of it in new free programs,
   432  and that you know you can do these things.@refill
   433  
   434  To make sure that everyone has such rights, we have to forbid you to deprive
   435  anyone else of these rights.  For example, if you distribute copies of the GNU
   436  MP library, you must give the recipients all the rights that you have.  You
   437  must make sure that they, too, receive or can get the source code.  And you
   438  must tell them their rights.@refill
   439  
   440  Also, for our own protection, we must make certain that everyone finds out
   441  that there is no warranty for the GNU MP library.  If it is modified by
   442  someone else and passed on, we want their recipients to know that what they
   443  have is not what we distributed, so that any problems introduced by others
   444  will not reflect on our reputation.@refill
   445  
   446  More precisely, the GNU MP library is dual licensed, under the conditions of
   447  the GNU Lesser General Public License version 3 (see
   448  @file{COPYING.LESSERv3}), or the GNU General Public License version 2 (see
   449  @file{COPYINGv2}). This is the recipient's choice, and the recipient also has
   450  the additional option of applying later versions of these licenses. (The
   451  reason for this dual licensing is to make it possible to use the library with
   452  programs which are licensed under GPL version 2, but which for historical or
   453  other reasons do not allow use under later versions of the GPL).
   454  
   455  Programs which are not part of the library itself, such as demonstration
   456  programs and the GMP testsuite, are licensed under the terms of the GNU
   457  General Public License version 3 (see @file{COPYINGv3}), or any later
   458  version.
   459  
   460  
   461  @node Introduction to GMP, Installing GMP, Copying, Top
   462  @comment  node-name,  next,  previous,  up
   463  @chapter Introduction to GNU MP
   464  @cindex Introduction
   465  
   466  GNU MP is a portable library written in C for arbitrary precision arithmetic
   467  on integers, rational numbers, and floating-point numbers.  It aims to provide
   468  the fastest possible arithmetic for all applications that need higher
   469  precision than is directly supported by the basic C types.
   470  
   471  Many applications use just a few hundred bits of precision; but some
   472  applications may need thousands or even millions of bits.  GMP is designed to
   473  give good performance for both, by choosing algorithms based on the sizes of
   474  the operands, and by carefully keeping the overhead at a minimum.
   475  
   476  The speed of GMP is achieved by using fullwords as the basic arithmetic type,
   477  by using sophisticated algorithms, by including carefully optimized assembly
   478  code for the most common inner loops for many different CPUs, and by a general
   479  emphasis on speed (as opposed to simplicity or elegance).
   480  
   481  There is assembly code for these CPUs:
   482  @cindex CPU types
   483  ARM Cortex-A9, Cortex-A15, and generic ARM,
   484  DEC Alpha 21064, 21164, and 21264,
   485  AMD K8 and K10 (sold under many brands, e.g. Athlon64, Phenom, Opteron)
   486  Bulldozer, and Bobcat,
   487  Intel Pentium, Pentium Pro/II/III, Pentium 4, Core2, Nehalem, Sandy bridge, Haswell, generic x86,
   488  Intel IA-64,
   489  Motorola/IBM PowerPC 32 and 64 such as POWER970, POWER5, POWER6, and POWER7,
   490  MIPS 32-bit and 64-bit,
   491  SPARC 32-bit ad 64-bit with special support for all UltraSPARC models.
   492  There is also assembly code for many obsolete CPUs.
   493  
   494  
   495  @cindex Home page
   496  @cindex Web page
   497  @noindent
   498  For up-to-date information on GMP, please see the GMP web pages at
   499  
   500  @display
   501  @uref{https://gmplib.org/}
   502  @end display
   503  
   504  @cindex Latest version of GMP
   505  @cindex Anonymous FTP of latest version
   506  @cindex FTP of latest version
   507  @noindent
   508  The latest version of the library is available at
   509  
   510  @display
   511  @uref{https://ftp.gnu.org/gnu/gmp/}
   512  @end display
   513  
   514  Many sites around the world mirror @samp{ftp.gnu.org}, please use a mirror
   515  near you, see @uref{https://www.gnu.org/order/ftp.html} for a full list.
   516  
   517  @cindex Mailing lists
   518  There are three public mailing lists of interest.  One for release
   519  announcements, one for general questions and discussions about usage of the GMP
   520  library and one for bug reports.  For more information, see
   521  
   522  @display
   523  @uref{https://gmplib.org/mailman/listinfo/}.
   524  @end display
   525  
   526  The proper place for bug reports is @email{gmp-bugs@@gmplib.org}.  See
   527  @ref{Reporting Bugs} for information about reporting bugs.
   528  
   529  @sp 1
   530  @section How to use this Manual
   531  @cindex About this manual
   532  
   533  Everyone should read @ref{GMP Basics}.  If you need to install the library
   534  yourself, then read @ref{Installing GMP}.  If you have a system with multiple
   535  ABIs, then read @ref{ABI and ISA}, for the compiler options that must be used
   536  on applications.
   537  
   538  The rest of the manual can be used for later reference, although it is
   539  probably a good idea to glance through it.
   540  
   541  
   542  @node Installing GMP, GMP Basics, Introduction to GMP, Top
   543  @comment  node-name,  next,  previous,  up
   544  @chapter Installing GMP
   545  @cindex Installing GMP
   546  @cindex Configuring GMP
   547  @cindex Building GMP
   548  
   549  GMP has an autoconf/automake/libtool based configuration system.  On a
   550  Unix-like system a basic build can be done with
   551  
   552  @example
   553  ./configure
   554  make
   555  @end example
   556  
   557  @noindent
   558  Some self-tests can be run with
   559  
   560  @example
   561  make check
   562  @end example
   563  
   564  @noindent
   565  And you can install (under @file{/usr/local} by default) with
   566  
   567  @example
   568  make install
   569  @end example
   570  
   571  If you experience problems, please report them to @email{gmp-bugs@@gmplib.org}.
   572  See @ref{Reporting Bugs}, for information on what to include in useful bug
   573  reports.
   574  
   575  @menu
   576  * Build Options::
   577  * ABI and ISA::
   578  * Notes for Package Builds::
   579  * Notes for Particular Systems::
   580  * Known Build Problems::
   581  * Performance optimization::
   582  @end menu
   583  
   584  
   585  @node Build Options, ABI and ISA, Installing GMP, Installing GMP
   586  @section Build Options
   587  @cindex Build options
   588  
   589  All the usual autoconf configure options are available, run @samp{./configure
   590  --help} for a summary.  The file @file{INSTALL.autoconf} has some generic
   591  installation information too.
   592  
   593  @table @asis
   594  @item Tools
   595  @cindex Non-Unix systems
   596  @samp{configure} requires various Unix-like tools.  See @ref{Notes for
   597  Particular Systems}, for some options on non-Unix systems.
   598  
   599  It might be possible to build without the help of @samp{configure}, certainly
   600  all the code is there, but unfortunately you'll be on your own.
   601  
   602  @item Build Directory
   603  @cindex Build directory
   604  To compile in a separate build directory, @command{cd} to that directory, and
   605  prefix the configure command with the path to the GMP source directory.  For
   606  example
   607  
   608  @example
   609  cd /my/build/dir
   610  /my/sources/gmp-@value{VERSION}/configure
   611  @end example
   612  
   613  Not all @samp{make} programs have the necessary features (@code{VPATH}) to
   614  support this.  In particular, SunOS and Slowaris @command{make} have bugs that
   615  make them unable to build in a separate directory.  Use GNU @command{make}
   616  instead.
   617  
   618  @item @option{--prefix} and @option{--exec-prefix}
   619  @cindex Prefix
   620  @cindex Exec prefix
   621  @cindex Install prefix
   622  @cindex @code{--prefix}
   623  @cindex @code{--exec-prefix}
   624  The @option{--prefix} option can be used in the normal way to direct GMP to
   625  install under a particular tree.  The default is @samp{/usr/local}.
   626  
   627  @option{--exec-prefix} can be used to direct architecture-dependent files like
   628  @file{libgmp.a} to a different location.  This can be used to share
   629  architecture-independent parts like the documentation, but separate the
   630  dependent parts.  Note however that @file{gmp.h} is
   631  architecture-dependent since it encodes certain aspects of @file{libgmp}, so
   632  it will be necessary to ensure both @file{$prefix/include} and
   633  @file{$exec_prefix/include} are available to the compiler.
   634  
   635  @item @option{--disable-shared}, @option{--disable-static}
   636  @cindex @code{--disable-shared}
   637  @cindex @code{--disable-static}
   638  By default both shared and static libraries are built (where possible), but
   639  one or other can be disabled.  Shared libraries result in smaller executables
   640  and permit code sharing between separate running processes, but on some CPUs
   641  are slightly slower, having a small cost on each function call.
   642  
   643  @item Native Compilation, @option{--build=CPU-VENDOR-OS}
   644  @cindex Native compilation
   645  @cindex Build system
   646  @cindex @code{--build}
   647  For normal native compilation, the system can be specified with
   648  @samp{--build}.  By default @samp{./configure} uses the output from running
   649  @samp{./config.guess}.  On some systems @samp{./config.guess} can determine
   650  the exact CPU type, on others it will be necessary to give it explicitly.  For
   651  example,
   652  
   653  @example
   654  ./configure --build=ultrasparc-sun-solaris2.7
   655  @end example
   656  
   657  In all cases the @samp{OS} part is important, since it controls how libtool
   658  generates shared libraries.  Running @samp{./config.guess} is the simplest way
   659  to see what it should be, if you don't know already.
   660  
   661  @item Cross Compilation, @option{--host=CPU-VENDOR-OS}
   662  @cindex Cross compiling
   663  @cindex Host system
   664  @cindex @code{--host}
   665  When cross-compiling, the system used for compiling is given by @samp{--build}
   666  and the system where the library will run is given by @samp{--host}.  For
   667  example when using a FreeBSD Athlon system to build GNU/Linux m68k binaries,
   668  
   669  @example
   670  ./configure --build=athlon-pc-freebsd3.5 --host=m68k-mac-linux-gnu
   671  @end example
   672  
   673  Compiler tools are sought first with the host system type as a prefix.  For
   674  example @command{m68k-mac-linux-gnu-ranlib} is tried, then plain
   675  @command{ranlib}.  This makes it possible for a set of cross-compiling tools
   676  to co-exist with native tools.  The prefix is the argument to @samp{--host},
   677  and this can be an alias, such as @samp{m68k-linux}.  But note that tools
   678  don't have to be setup this way, it's enough to just have a @env{PATH} with a
   679  suitable cross-compiling @command{cc} etc.
   680  
   681  Compiling for a different CPU in the same family as the build system is a form
   682  of cross-compilation, though very possibly this would merely be special
   683  options on a native compiler.  In any case @samp{./configure} avoids depending
   684  on being able to run code on the build system, which is important when
   685  creating binaries for a newer CPU since they very possibly won't run on the
   686  build system.
   687  
   688  In all cases the compiler must be able to produce an executable (of whatever
   689  format) from a standard C @code{main}.  Although only object files will go to
   690  make up @file{libgmp}, @samp{./configure} uses linking tests for various
   691  purposes, such as determining what functions are available on the host system.
   692  
   693  Currently a warning is given unless an explicit @samp{--build} is used when
   694  cross-compiling, because it may not be possible to correctly guess the build
   695  system type if the @env{PATH} has only a cross-compiling @command{cc}.
   696  
   697  Note that the @samp{--target} option is not appropriate for GMP@.  It's for use
   698  when building compiler tools, with @samp{--host} being where they will run,
   699  and @samp{--target} what they'll produce code for.  Ordinary programs or
   700  libraries like GMP are only interested in the @samp{--host} part, being where
   701  they'll run.  (Some past versions of GMP used @samp{--target} incorrectly.)
   702  
   703  @item CPU types
   704  @cindex CPU types
   705  In general, if you want a library that runs as fast as possible, you should
   706  configure GMP for the exact CPU type your system uses.  However, this may mean
   707  the binaries won't run on older members of the family, and might run slower on
   708  other members, older or newer.  The best idea is always to build GMP for the
   709  exact machine type you intend to run it on.
   710  
   711  The following CPUs have specific support.  See @file{configure.ac} for details
   712  of what code and compiler options they select.
   713  
   714  @itemize @bullet
   715  
   716  @c Keep this formatting, it's easy to read and it can be grepped to
   717  @c automatically test that CPUs listed get through ./config.sub
   718  
   719  @item
   720  Alpha:
   721  @nisamp{alpha},
   722  @nisamp{alphaev5},
   723  @nisamp{alphaev56},
   724  @nisamp{alphapca56},
   725  @nisamp{alphapca57},
   726  @nisamp{alphaev6},
   727  @nisamp{alphaev67},
   728  @nisamp{alphaev68}
   729  @nisamp{alphaev7}
   730  
   731  @item
   732  Cray:
   733  @nisamp{c90},
   734  @nisamp{j90},
   735  @nisamp{t90},
   736  @nisamp{sv1}
   737  
   738  @item
   739  HPPA:
   740  @nisamp{hppa1.0},
   741  @nisamp{hppa1.1},
   742  @nisamp{hppa2.0},
   743  @nisamp{hppa2.0n},
   744  @nisamp{hppa2.0w},
   745  @nisamp{hppa64}
   746  
   747  @item
   748  IA-64:
   749  @nisamp{ia64},
   750  @nisamp{itanium},
   751  @nisamp{itanium2}
   752  
   753  @item
   754  MIPS:
   755  @nisamp{mips},
   756  @nisamp{mips3},
   757  @nisamp{mips64}
   758  
   759  @item
   760  Motorola:
   761  @nisamp{m68k},
   762  @nisamp{m68000},
   763  @nisamp{m68010},
   764  @nisamp{m68020},
   765  @nisamp{m68030},
   766  @nisamp{m68040},
   767  @nisamp{m68060},
   768  @nisamp{m68302},
   769  @nisamp{m68360},
   770  @nisamp{m88k},
   771  @nisamp{m88110}
   772  
   773  @item
   774  POWER:
   775  @nisamp{power},
   776  @nisamp{power1},
   777  @nisamp{power2},
   778  @nisamp{power2sc}
   779  
   780  @item
   781  PowerPC:
   782  @nisamp{powerpc},
   783  @nisamp{powerpc64},
   784  @nisamp{powerpc401},
   785  @nisamp{powerpc403},
   786  @nisamp{powerpc405},
   787  @nisamp{powerpc505},
   788  @nisamp{powerpc601},
   789  @nisamp{powerpc602},
   790  @nisamp{powerpc603},
   791  @nisamp{powerpc603e},
   792  @nisamp{powerpc604},
   793  @nisamp{powerpc604e},
   794  @nisamp{powerpc620},
   795  @nisamp{powerpc630},
   796  @nisamp{powerpc740},
   797  @nisamp{powerpc7400},
   798  @nisamp{powerpc7450},
   799  @nisamp{powerpc750},
   800  @nisamp{powerpc801},
   801  @nisamp{powerpc821},
   802  @nisamp{powerpc823},
   803  @nisamp{powerpc860},
   804  @nisamp{powerpc970}
   805  
   806  @item
   807  SPARC:
   808  @nisamp{sparc},
   809  @nisamp{sparcv8},
   810  @nisamp{microsparc},
   811  @nisamp{supersparc},
   812  @nisamp{sparcv9},
   813  @nisamp{ultrasparc},
   814  @nisamp{ultrasparc2},
   815  @nisamp{ultrasparc2i},
   816  @nisamp{ultrasparc3},
   817  @nisamp{sparc64}
   818  
   819  @item
   820  x86 family:
   821  @nisamp{i386},
   822  @nisamp{i486},
   823  @nisamp{i586},
   824  @nisamp{pentium},
   825  @nisamp{pentiummmx},
   826  @nisamp{pentiumpro},
   827  @nisamp{pentium2},
   828  @nisamp{pentium3},
   829  @nisamp{pentium4},
   830  @nisamp{k6},
   831  @nisamp{k62},
   832  @nisamp{k63},
   833  @nisamp{athlon},
   834  @nisamp{amd64},
   835  @nisamp{viac3},
   836  @nisamp{viac32}
   837  
   838  @item
   839  Other:
   840  @nisamp{arm},
   841  @nisamp{sh},
   842  @nisamp{sh2},
   843  @nisamp{vax},
   844  @end itemize
   845  
   846  CPUs not listed will use generic C code.
   847  
   848  @item Generic C Build
   849  @cindex Generic C
   850  If some of the assembly code causes problems, or if otherwise desired, the
   851  generic C code can be selected with the configure @option{--disable-assembly}.
   852  
   853  Note that this will run quite slowly, but it should be portable and should at
   854  least make it possible to get something running if all else fails.
   855  
   856  @item Fat binary, @option{--enable-fat}
   857  @cindex Fat binary
   858  @cindex @code{--enable-fat}
   859  Using @option{--enable-fat} selects a ``fat binary'' build on x86, where
   860  optimized low level subroutines are chosen at runtime according to the CPU
   861  detected.  This means more code, but gives good performance on all x86 chips.
   862  (This option might become available for more architectures in the future.)
   863  
   864  @item @option{ABI}
   865  @cindex ABI
   866  On some systems GMP supports multiple ABIs (application binary interfaces),
   867  meaning data type sizes and calling conventions.  By default GMP chooses the
   868  best ABI available, but a particular ABI can be selected.  For example
   869  
   870  @example
   871  ./configure --host=mips64-sgi-irix6 ABI=n32
   872  @end example
   873  
   874  See @ref{ABI and ISA}, for the available choices on relevant CPUs, and what
   875  applications need to do.
   876  
   877  @item @option{CC}, @option{CFLAGS}
   878  @cindex C compiler
   879  @cindex @code{CC}
   880  @cindex @code{CFLAGS}
   881  By default the C compiler used is chosen from among some likely candidates,
   882  with @command{gcc} normally preferred if it's present.  The usual
   883  @samp{CC=whatever} can be passed to @samp{./configure} to choose something
   884  different.
   885  
   886  For various systems, default compiler flags are set based on the CPU and
   887  compiler.  The usual @samp{CFLAGS="-whatever"} can be passed to
   888  @samp{./configure} to use something different or to set good flags for systems
   889  GMP doesn't otherwise know.
   890  
   891  The @samp{CC} and @samp{CFLAGS} used are printed during @samp{./configure},
   892  and can be found in each generated @file{Makefile}.  This is the easiest way
   893  to check the defaults when considering changing or adding something.
   894  
   895  Note that when @samp{CC} and @samp{CFLAGS} are specified on a system
   896  supporting multiple ABIs it's important to give an explicit
   897  @samp{ABI=whatever}, since GMP can't determine the ABI just from the flags and
   898  won't be able to select the correct assembly code.
   899  
   900  If just @samp{CC} is selected then normal default @samp{CFLAGS} for that
   901  compiler will be used (if GMP recognises it).  For example @samp{CC=gcc} can
   902  be used to force the use of GCC, with default flags (and default ABI).
   903  
   904  @item @option{CPPFLAGS}
   905  @cindex @code{CPPFLAGS}
   906  Any flags like @samp{-D} defines or @samp{-I} includes required by the
   907  preprocessor should be set in @samp{CPPFLAGS} rather than @samp{CFLAGS}.
   908  Compiling is done with both @samp{CPPFLAGS} and @samp{CFLAGS}, but
   909  preprocessing uses just @samp{CPPFLAGS}.  This distinction is because most
   910  preprocessors won't accept all the flags the compiler does.  Preprocessing is
   911  done separately in some configure tests.
   912  
   913  @item @option{CC_FOR_BUILD}
   914  @cindex @code{CC_FOR_BUILD}
   915  Some build-time programs are compiled and run to generate host-specific data
   916  tables.  @samp{CC_FOR_BUILD} is the compiler used for this.  It doesn't need
   917  to be in any particular ABI or mode, it merely needs to generate executables
   918  that can run.  The default is to try the selected @samp{CC} and some likely
   919  candidates such as @samp{cc} and @samp{gcc}, looking for something that works.
   920  
   921  No flags are used with @samp{CC_FOR_BUILD} because a simple invocation like
   922  @samp{cc foo.c} should be enough.  If some particular options are required
   923  they can be included as for instance @samp{CC_FOR_BUILD="cc -whatever"}.
   924  
   925  @item C++ Support, @option{--enable-cxx}
   926  @cindex C++ support
   927  @cindex @code{--enable-cxx}
   928  C++ support in GMP can be enabled with @samp{--enable-cxx}, in which case a
   929  C++ compiler will be required.  As a convenience @samp{--enable-cxx=detect}
   930  can be used to enable C++ support only if a compiler can be found.  The C++
   931  support consists of a library @file{libgmpxx.la} and header file
   932  @file{gmpxx.h} (@pxref{Headers and Libraries}).
   933  
   934  A separate @file{libgmpxx.la} has been adopted rather than having C++ objects
   935  within @file{libgmp.la} in order to ensure dynamic linked C programs aren't
   936  bloated by a dependency on the C++ standard library, and to avoid any chance
   937  that the C++ compiler could be required when linking plain C programs.
   938  
   939  @file{libgmpxx.la} will use certain internals from @file{libgmp.la} and can
   940  only be expected to work with @file{libgmp.la} from the same GMP version.
   941  Future changes to the relevant internals will be accompanied by renaming, so a
   942  mismatch will cause unresolved symbols rather than perhaps mysterious
   943  misbehaviour.
   944  
   945  In general @file{libgmpxx.la} will be usable only with the C++ compiler that
   946  built it, since name mangling and runtime support are usually incompatible
   947  between different compilers.
   948  
   949  @item @option{CXX}, @option{CXXFLAGS}
   950  @cindex C++ compiler
   951  @cindex @code{CXX}
   952  @cindex @code{CXXFLAGS}
   953  When C++ support is enabled, the C++ compiler and its flags can be set with
   954  variables @samp{CXX} and @samp{CXXFLAGS} in the usual way.  The default for
   955  @samp{CXX} is the first compiler that works from a list of likely candidates,
   956  with @command{g++} normally preferred when available.  The default for
   957  @samp{CXXFLAGS} is to try @samp{CFLAGS}, @samp{CFLAGS} without @samp{-g}, then
   958  for @command{g++} either @samp{-g -O2} or @samp{-O2}, or for other compilers
   959  @samp{-g} or nothing.  Trying @samp{CFLAGS} this way is convenient when using
   960  @samp{gcc} and @samp{g++} together, since the flags for @samp{gcc} will
   961  usually suit @samp{g++}.
   962  
   963  It's important that the C and C++ compilers match, meaning their startup and
   964  runtime support routines are compatible and that they generate code in the
   965  same ABI (if there's a choice of ABIs on the system).  @samp{./configure}
   966  isn't currently able to check these things very well itself, so for that
   967  reason @samp{--disable-cxx} is the default, to avoid a build failure due to a
   968  compiler mismatch.  Perhaps this will change in the future.
   969  
   970  Incidentally, it's normally not good enough to set @samp{CXX} to the same as
   971  @samp{CC}.  Although @command{gcc} for instance recognises @file{foo.cc} as
   972  C++ code, only @command{g++} will invoke the linker the right way when
   973  building an executable or shared library from C++ object files.
   974  
   975  @item Temporary Memory, @option{--enable-alloca=<choice>}
   976  @cindex Temporary memory
   977  @cindex Stack overflow
   978  @cindex @code{alloca}
   979  @cindex @code{--enable-alloca}
   980  GMP allocates temporary workspace using one of the following three methods,
   981  which can be selected with for instance
   982  @samp{--enable-alloca=malloc-reentrant}.
   983  
   984  @itemize @bullet
   985  @item
   986  @samp{alloca} - C library or compiler builtin.
   987  @item
   988  @samp{malloc-reentrant} - the heap, in a re-entrant fashion.
   989  @item
   990  @samp{malloc-notreentrant} - the heap, with global variables.
   991  @end itemize
   992  
   993  For convenience, the following choices are also available.
   994  @samp{--disable-alloca} is the same as @samp{no}.
   995  
   996  @itemize @bullet
   997  @item
   998  @samp{yes} - a synonym for @samp{alloca}.
   999  @item
  1000  @samp{no} - a synonym for @samp{malloc-reentrant}.
  1001  @item
  1002  @samp{reentrant} - @code{alloca} if available, otherwise
  1003  @samp{malloc-reentrant}.  This is the default.
  1004  @item
  1005  @samp{notreentrant} - @code{alloca} if available, otherwise
  1006  @samp{malloc-notreentrant}.
  1007  @end itemize
  1008  
  1009  @code{alloca} is reentrant and fast, and is recommended.  It actually allocates
  1010  just small blocks on the stack; larger ones use malloc-reentrant.
  1011  
  1012  @samp{malloc-reentrant} is, as the name suggests, reentrant and thread safe,
  1013  but @samp{malloc-notreentrant} is faster and should be used if reentrancy is
  1014  not required.
  1015  
  1016  The two malloc methods in fact use the memory allocation functions selected by
  1017  @code{mp_set_memory_functions}, these being @code{malloc} and friends by
  1018  default.  @xref{Custom Allocation}.
  1019  
  1020  An additional choice @samp{--enable-alloca=debug} is available, to help when
  1021  debugging memory related problems (@pxref{Debugging}).
  1022  
  1023  @item FFT Multiplication, @option{--disable-fft}
  1024  @cindex FFT multiplication
  1025  @cindex @code{--disable-fft}
  1026  By default multiplications are done using Karatsuba, 3-way Toom, higher degree
  1027  Toom, and Fermat FFT@.  The FFT is only used on large to very large operands
  1028  and can be disabled to save code size if desired.
  1029  
  1030  @item Assertion Checking, @option{--enable-assert}
  1031  @cindex Assertion checking
  1032  @cindex @code{--enable-assert}
  1033  This option enables some consistency checking within the library.  This can be
  1034  of use while debugging, @pxref{Debugging}.
  1035  
  1036  @item Execution Profiling, @option{--enable-profiling=prof/gprof/instrument}
  1037  @cindex Execution profiling
  1038  @cindex @code{--enable-profiling}
  1039  Enable profiling support, in one of various styles, @pxref{Profiling}.
  1040  
  1041  @item @option{MPN_PATH}
  1042  @cindex @code{MPN_PATH}
  1043  Various assembly versions of each mpn subroutines are provided.  For a given
  1044  CPU, a search is made though a path to choose a version of each.  For example
  1045  @samp{sparcv8} has
  1046  
  1047  @example
  1048  MPN_PATH="sparc32/v8 sparc32 generic"
  1049  @end example
  1050  
  1051  which means look first for v8 code, then plain sparc32 (which is v7), and
  1052  finally fall back on generic C@.  Knowledgeable users with special requirements
  1053  can specify a different path.  Normally this is completely unnecessary.
  1054  
  1055  @item Documentation
  1056  @cindex Documentation formats
  1057  @cindex Texinfo
  1058  The source for the document you're now reading is @file{doc/gmp.texi}, in
  1059  Texinfo format, see @GMPreftop{texinfo, Texinfo}.
  1060  
  1061  @cindex Postscript
  1062  @cindex DVI
  1063  @cindex PDF
  1064  Info format @samp{doc/gmp.info} is included in the distribution.  The usual
  1065  automake targets are available to make PostScript, DVI, PDF and HTML (these
  1066  will require various @TeX{} and Texinfo tools).
  1067  
  1068  @cindex DocBook
  1069  @cindex XML
  1070  DocBook and XML can be generated by the Texinfo @command{makeinfo} program
  1071  too, see @ref{makeinfo options,, Options for @command{makeinfo}, texinfo,
  1072  Texinfo}.
  1073  
  1074  Some supplementary notes can also be found in the @file{doc} subdirectory.
  1075  
  1076  @end table
  1077  
  1078  
  1079  @need 2000
  1080  @node ABI and ISA, Notes for Package Builds, Build Options, Installing GMP
  1081  @section ABI and ISA
  1082  @cindex ABI
  1083  @cindex Application Binary Interface
  1084  @cindex ISA
  1085  @cindex Instruction Set Architecture
  1086  
  1087  ABI (Application Binary Interface) refers to the calling conventions between
  1088  functions, meaning what registers are used and what sizes the various C data
  1089  types are.  ISA (Instruction Set Architecture) refers to the instructions and
  1090  registers a CPU has available.
  1091  
  1092  Some 64-bit ISA CPUs have both a 64-bit ABI and a 32-bit ABI defined, the
  1093  latter for compatibility with older CPUs in the family.  GMP supports some
  1094  CPUs like this in both ABIs.  In fact within GMP @samp{ABI} means a
  1095  combination of chip ABI, plus how GMP chooses to use it.  For example in some
  1096  32-bit ABIs, GMP may support a limb as either a 32-bit @code{long} or a 64-bit
  1097  @code{long long}.
  1098  
  1099  By default GMP chooses the best ABI available for a given system, and this
  1100  generally gives significantly greater speed.  But an ABI can be chosen
  1101  explicitly to make GMP compatible with other libraries, or particular
  1102  application requirements.  For example,
  1103  
  1104  @example
  1105  ./configure ABI=32
  1106  @end example
  1107  
  1108  In all cases it's vital that all object code used in a given program is
  1109  compiled for the same ABI.
  1110  
  1111  Usually a limb is implemented as a @code{long}.  When a @code{long long} limb
  1112  is used this is encoded in the generated @file{gmp.h}.  This is convenient for
  1113  applications, but it does mean that @file{gmp.h} will vary, and can't be just
  1114  copied around.  @file{gmp.h} remains compiler independent though, since all
  1115  compilers for a particular ABI will be expected to use the same limb type.
  1116  
  1117  Currently no attempt is made to follow whatever conventions a system has for
  1118  installing library or header files built for a particular ABI@.  This will
  1119  probably only matter when installing multiple builds of GMP, and it might be
  1120  as simple as configuring with a special @samp{libdir}, or it might require
  1121  more than that.  Note that builds for different ABIs need to done separately,
  1122  with a fresh @command{./configure} and @command{make} each.
  1123  
  1124  @sp 1
  1125  @table @asis
  1126  @need 1000
  1127  @item AMD64 (@samp{x86_64})
  1128  @cindex AMD64
  1129  On AMD64 systems supporting both 32-bit and 64-bit modes for applications, the
  1130  following ABI choices are available.
  1131  
  1132  @table @asis
  1133  @item @samp{ABI=64}
  1134  The 64-bit ABI uses 64-bit limbs and pointers and makes full use of the chip
  1135  architecture.  This is the default.  Applications will usually not need
  1136  special compiler flags, but for reference the option is
  1137  
  1138  @example
  1139  gcc  -m64
  1140  @end example
  1141  
  1142  @item @samp{ABI=32}
  1143  The 32-bit ABI is the usual i386 conventions.  This will be slower, and is not
  1144  recommended except for inter-operating with other code not yet 64-bit capable.
  1145  Applications must be compiled with
  1146  
  1147  @example
  1148  gcc  -m32
  1149  @end example
  1150  
  1151  (In GCC 2.95 and earlier there's no @samp{-m32} option, it's the only mode.)
  1152  
  1153  @item @samp{ABI=x32}
  1154  The x32 ABI uses 64-bit limbs but 32-bit pointers.  Like the 64-bit ABI, it
  1155  makes full use of the chip's arithmetic capabilities.  This ABI is not
  1156  supported by all operating systems.
  1157  
  1158  @example
  1159  gcc  -mx32
  1160  @end example
  1161  
  1162  @end table
  1163  
  1164  @sp 1
  1165  @need 1000
  1166  @item HPPA 2.0 (@samp{hppa2.0*}, @samp{hppa64})
  1167  @cindex HPPA
  1168  @cindex HP-UX
  1169  @table @asis
  1170  @item @samp{ABI=2.0w}
  1171  The 2.0w ABI uses 64-bit limbs and pointers and is available on HP-UX 11 or
  1172  up.  Applications must be compiled with
  1173  
  1174  @example
  1175  gcc [built for 2.0w]
  1176  cc  +DD64
  1177  @end example
  1178  
  1179  @item @samp{ABI=2.0n}
  1180  The 2.0n ABI means the 32-bit HPPA 1.0 ABI and all its normal calling
  1181  conventions, but with 64-bit instructions permitted within functions.  GMP
  1182  uses a 64-bit @code{long long} for a limb.  This ABI is available on hppa64
  1183  GNU/Linux and on HP-UX 10 or higher.  Applications must be compiled with
  1184  
  1185  @example
  1186  gcc [built for 2.0n]
  1187  cc  +DA2.0 +e
  1188  @end example
  1189  
  1190  Note that current versions of GCC (eg.@: 3.2) don't generate 64-bit
  1191  instructions for @code{long long} operations and so may be slower than for
  1192  2.0w.  (The GMP assembly code is the same though.)
  1193  
  1194  @item @samp{ABI=1.0}
  1195  HPPA 2.0 CPUs can run all HPPA 1.0 and 1.1 code in the 32-bit HPPA 1.0 ABI@.
  1196  No special compiler options are needed for applications.
  1197  @end table
  1198  
  1199  All three ABIs are available for CPU types @samp{hppa2.0w}, @samp{hppa2.0} and
  1200  @samp{hppa64}, but for CPU type @samp{hppa2.0n} only 2.0n or 1.0 are
  1201  considered.
  1202  
  1203  Note that GCC on HP-UX has no options to choose between 2.0n and 2.0w modes,
  1204  unlike HP @command{cc}.  Instead it must be built for one or the other ABI@.
  1205  GMP will detect how it was built, and skip to the corresponding @samp{ABI}.
  1206  
  1207  @sp 1
  1208  @need 1500
  1209  @item IA-64 under HP-UX (@samp{ia64*-*-hpux*}, @samp{itanium*-*-hpux*})
  1210  @cindex IA-64
  1211  @cindex HP-UX
  1212  HP-UX supports two ABIs for IA-64.  GMP performance is the same in both.
  1213  
  1214  @table @asis
  1215  @item @samp{ABI=32}
  1216  In the 32-bit ABI, pointers, @code{int}s and @code{long}s are 32 bits and GMP
  1217  uses a 64 bit @code{long long} for a limb.  Applications can be compiled
  1218  without any special flags since this ABI is the default in both HP C and GCC,
  1219  but for reference the flags are
  1220  
  1221  @example
  1222  gcc  -milp32
  1223  cc   +DD32
  1224  @end example
  1225  
  1226  @item @samp{ABI=64}
  1227  In the 64-bit ABI, @code{long}s and pointers are 64 bits and GMP uses a
  1228  @code{long} for a limb.  Applications must be compiled with
  1229  
  1230  @example
  1231  gcc  -mlp64
  1232  cc   +DD64
  1233  @end example
  1234  @end table
  1235  
  1236  On other IA-64 systems, GNU/Linux for instance, @samp{ABI=64} is the only
  1237  choice.
  1238  
  1239  @sp 1
  1240  @need 1000
  1241  @item MIPS under IRIX 6 (@samp{mips*-*-irix[6789]})
  1242  @cindex MIPS
  1243  @cindex IRIX
  1244  IRIX 6 always has a 64-bit MIPS 3 or better CPU, and supports ABIs o32, n32,
  1245  and 64.  n32 or 64 are recommended, and GMP performance will be the same in
  1246  each.  The default is n32.
  1247  
  1248  @table @asis
  1249  @item @samp{ABI=o32}
  1250  The o32 ABI is 32-bit pointers and integers, and no 64-bit operations.  GMP
  1251  will be slower than in n32 or 64, this option only exists to support old
  1252  compilers, eg.@: GCC 2.7.2.  Applications can be compiled with no special
  1253  flags on an old compiler, or on a newer compiler with
  1254  
  1255  @example
  1256  gcc  -mabi=32
  1257  cc   -32
  1258  @end example
  1259  
  1260  @item @samp{ABI=n32}
  1261  The n32 ABI is 32-bit pointers and integers, but with a 64-bit limb using a
  1262  @code{long long}.  Applications must be compiled with
  1263  
  1264  @example
  1265  gcc  -mabi=n32
  1266  cc   -n32
  1267  @end example
  1268  
  1269  @item @samp{ABI=64}
  1270  The 64-bit ABI is 64-bit pointers and integers.  Applications must be compiled
  1271  with
  1272  
  1273  @example
  1274  gcc  -mabi=64
  1275  cc   -64
  1276  @end example
  1277  @end table
  1278  
  1279  Note that MIPS GNU/Linux, as of kernel version 2.2, doesn't have the necessary
  1280  support for n32 or 64 and so only gets a 32-bit limb and the MIPS 2 code.
  1281  
  1282  @sp 1
  1283  @need 1000
  1284  @item PowerPC 64 (@samp{powerpc64}, @samp{powerpc620}, @samp{powerpc630}, @samp{powerpc970}, @samp{power4}, @samp{power5})
  1285  @cindex PowerPC
  1286  @table @asis
  1287  @item @samp{ABI=mode64}
  1288  @cindex AIX
  1289  The AIX 64 ABI uses 64-bit limbs and pointers and is the default on PowerPC 64
  1290  @samp{*-*-aix*} systems.  Applications must be compiled with
  1291  
  1292  @example
  1293  gcc  -maix64
  1294  xlc  -q64
  1295  @end example
  1296  
  1297  On 64-bit GNU/Linux, BSD, and Mac OS X/Darwin systems, the applications must
  1298  be compiled with
  1299  
  1300  @example
  1301  gcc  -m64
  1302  @end example
  1303  
  1304  @item @samp{ABI=mode32}
  1305  The @samp{mode32} ABI uses a 64-bit @code{long long} limb but with the chip
  1306  still in 32-bit mode and using 32-bit calling conventions.  This is the default
  1307  for systems where the true 64-bit ABI is unavailable.  No special compiler
  1308  options are typically needed for applications.  This ABI is not available under
  1309  AIX.
  1310  
  1311  @item @samp{ABI=32}
  1312  This is the basic 32-bit PowerPC ABI, with a 32-bit limb.  No special compiler
  1313  options are needed for applications.
  1314  @end table
  1315  
  1316  GMP's speed is greatest for the @samp{mode64} ABI, the @samp{mode32} ABI is 2nd
  1317  best.  In @samp{ABI=32} only the 32-bit ISA is used and this doesn't make full
  1318  use of a 64-bit chip.
  1319  
  1320  @sp 1
  1321  @need 1000
  1322  @item Sparc V9 (@samp{sparc64}, @samp{sparcv9}, @samp{ultrasparc*})
  1323  @cindex Sparc V9
  1324  @cindex Solaris
  1325  @cindex Sun
  1326  @table @asis
  1327  @item @samp{ABI=64}
  1328  The 64-bit V9 ABI is available on the various BSD sparc64 ports, recent
  1329  versions of Sparc64 GNU/Linux, and Solaris 2.7 and up (when the kernel is in
  1330  64-bit mode).  GCC 3.2 or higher, or Sun @command{cc} is required.  On
  1331  GNU/Linux, depending on the default @command{gcc} mode, applications must be
  1332  compiled with
  1333  
  1334  @example
  1335  gcc  -m64
  1336  @end example
  1337  
  1338  On Solaris applications must be compiled with
  1339  
  1340  @example
  1341  gcc  -m64 -mptr64 -Wa,-xarch=v9 -mcpu=v9
  1342  cc   -xarch=v9
  1343  @end example
  1344  
  1345  On the BSD sparc64 systems no special options are required, since 64-bits is
  1346  the only ABI available.
  1347  
  1348  @item @samp{ABI=32}
  1349  For the basic 32-bit ABI, GMP still uses as much of the V9 ISA as it can.  In
  1350  the Sun documentation this combination is known as ``v8plus''.  On GNU/Linux,
  1351  depending on the default @command{gcc} mode, applications may need to be
  1352  compiled with
  1353  
  1354  @example
  1355  gcc  -m32
  1356  @end example
  1357  
  1358  On Solaris, no special compiler options are required for applications, though
  1359  using something like the following is recommended.  (@command{gcc} 2.8 and
  1360  earlier only support @samp{-mv8} though.)
  1361  
  1362  @example
  1363  gcc  -mv8plus
  1364  cc   -xarch=v8plus
  1365  @end example
  1366  @end table
  1367  
  1368  GMP speed is greatest in @samp{ABI=64}, so it's the default where available.
  1369  The speed is partly because there are extra registers available and partly
  1370  because 64-bits is considered the more important case and has therefore had
  1371  better code written for it.
  1372  
  1373  Don't be confused by the names of the @samp{-m} and @samp{-x} compiler
  1374  options, they're called @samp{arch} but effectively control both ABI and ISA@.
  1375  
  1376  On Solaris 2.6 and earlier, only @samp{ABI=32} is available since the kernel
  1377  doesn't save all registers.
  1378  
  1379  On Solaris 2.7 with the kernel in 32-bit mode, a normal native build will
  1380  reject @samp{ABI=64} because the resulting executables won't run.
  1381  @samp{ABI=64} can still be built if desired by making it look like a
  1382  cross-compile, for example
  1383  
  1384  @example
  1385  ./configure --build=none --host=sparcv9-sun-solaris2.7 ABI=64
  1386  @end example
  1387  @end table
  1388  
  1389  
  1390  @need 2000
  1391  @node Notes for Package Builds, Notes for Particular Systems, ABI and ISA, Installing GMP
  1392  @section Notes for Package Builds
  1393  @cindex Build notes for binary packaging
  1394  @cindex Packaged builds
  1395  
  1396  GMP should present no great difficulties for packaging in a binary
  1397  distribution.
  1398  
  1399  @cindex Libtool versioning
  1400  @cindex Shared library versioning
  1401  Libtool is used to build the library and @samp{-version-info} is set
  1402  appropriately, having started from @samp{3:0:0} in GMP 3.0 (@pxref{Versioning,
  1403  Library interface versions, Library interface versions, libtool, GNU
  1404  Libtool}).
  1405  
  1406  The GMP 4 series will be upwardly binary compatible in each release and will
  1407  be upwardly binary compatible with all of the GMP 3 series.  Additional
  1408  function interfaces may be added in each release, so on systems where libtool
  1409  versioning is not fully checked by the loader an auxiliary mechanism may be
  1410  needed to express that a dynamic linked application depends on a new enough
  1411  GMP.
  1412  
  1413  An auxiliary mechanism may also be needed to express that @file{libgmpxx.la}
  1414  (from @option{--enable-cxx}, @pxref{Build Options}) requires @file{libgmp.la}
  1415  from the same GMP version, since this is not done by the libtool versioning,
  1416  nor otherwise.  A mismatch will result in unresolved symbols from the linker,
  1417  or perhaps the loader.
  1418  
  1419  When building a package for a CPU family, care should be taken to use
  1420  @samp{--host} (or @samp{--build}) to choose the least common denominator among
  1421  the CPUs which might use the package.  For example this might mean plain
  1422  @samp{sparc} (meaning V7) for SPARCs.
  1423  
  1424  For x86s, @option{--enable-fat} sets things up for a fat binary build, making a
  1425  runtime selection of optimized low level routines.  This is a good choice for
  1426  packaging to run on a range of x86 chips.
  1427  
  1428  Users who care about speed will want GMP built for their exact CPU type, to
  1429  make best use of the available optimizations.  Providing a way to suitably
  1430  rebuild a package may be useful.  This could be as simple as making it
  1431  possible for a user to omit @samp{--build} (and @samp{--host}) so
  1432  @samp{./config.guess} will detect the CPU@.  But a way to manually specify a
  1433  @samp{--build} will be wanted for systems where @samp{./config.guess} is
  1434  inexact.
  1435  
  1436  On systems with multiple ABIs, a packaged build will need to decide which
  1437  among the choices is to be provided, see @ref{ABI and ISA}.  A given run of
  1438  @samp{./configure} etc will only build one ABI@.  If a second ABI is also
  1439  required then a second run of @samp{./configure} etc must be made, starting
  1440  from a clean directory tree (@samp{make distclean}).
  1441  
  1442  As noted under ``ABI and ISA'', currently no attempt is made to follow system
  1443  conventions for install locations that vary with ABI, such as
  1444  @file{/usr/lib/sparcv9} for @samp{ABI=64} as opposed to @file{/usr/lib} for
  1445  @samp{ABI=32}.  A package build can override @samp{libdir} and other standard
  1446  variables as necessary.
  1447  
  1448  Note that @file{gmp.h} is a generated file, and will be architecture and ABI
  1449  dependent.  When attempting to install two ABIs simultaneously it will be
  1450  important that an application compile gets the correct @file{gmp.h} for its
  1451  desired ABI@.  If compiler include paths don't vary with ABI options then it
  1452  might be necessary to create a @file{/usr/include/gmp.h} which tests
  1453  preprocessor symbols and chooses the correct actual @file{gmp.h}.
  1454  
  1455  
  1456  @need 2000
  1457  @node Notes for Particular Systems, Known Build Problems, Notes for Package Builds, Installing GMP
  1458  @section Notes for Particular Systems
  1459  @cindex Build notes for particular systems
  1460  @cindex Particular systems
  1461  @cindex Systems
  1462  @table @asis
  1463  
  1464  @c This section is more or less meant for notes about performance or about
  1465  @c build problems that have been worked around but might leave a user
  1466  @c scratching their head.  Fun with different ABIs on a system belongs in the
  1467  @c above section.
  1468  
  1469  @item AIX 3 and 4
  1470  @cindex AIX
  1471  On systems @samp{*-*-aix[34]*} shared libraries are disabled by default, since
  1472  some versions of the native @command{ar} fail on the convenience libraries
  1473  used.  A shared build can be attempted with
  1474  
  1475  @example
  1476  ./configure --enable-shared --disable-static
  1477  @end example
  1478  
  1479  Note that the @samp{--disable-static} is necessary because in a shared build
  1480  libtool makes @file{libgmp.a} a symlink to @file{libgmp.so}, apparently for
  1481  the benefit of old versions of @command{ld} which only recognise @file{.a},
  1482  but unfortunately this is done even if a fully functional @command{ld} is
  1483  available.
  1484  
  1485  @item ARM
  1486  @cindex ARM
  1487  On systems @samp{arm*-*-*}, versions of GCC up to and including 2.95.3 have a
  1488  bug in unsigned division, giving wrong results for some operands.  GMP
  1489  @samp{./configure} will demand GCC 2.95.4 or later.
  1490  
  1491  @item Compaq C++
  1492  @cindex Compaq C++
  1493  Compaq C++ on OSF 5.1 has two flavours of @code{iostream}, a standard one and
  1494  an old pre-standard one (see @samp{man iostream_intro}).  GMP can only use the
  1495  standard one, which unfortunately is not the default but must be selected by
  1496  defining @code{__USE_STD_IOSTREAM}.  Configure with for instance
  1497  
  1498  @example
  1499  ./configure --enable-cxx CPPFLAGS=-D__USE_STD_IOSTREAM
  1500  @end example
  1501  
  1502  @item Floating Point Mode
  1503  @cindex Floating point mode
  1504  @cindex Hardware floating point mode
  1505  @cindex Precision of hardware floating point
  1506  @cindex x87
  1507  On some systems, the hardware floating point has a control mode which can set
  1508  all operations to be done in a particular precision, for instance single,
  1509  double or extended on x86 systems (x87 floating point).  The GMP functions
  1510  involving a @code{double} cannot be expected to operate to their full
  1511  precision when the hardware is in single precision mode.  Of course this
  1512  affects all code, including application code, not just GMP.
  1513  
  1514  @item FreeBSD 7.x, 8.x, 9.0, 9.1, 9.2
  1515  @cindex FreeBSD
  1516  @command{m4} in these releases of FreeBSD has an eval function which ignores
  1517  its 2nd and 3rd arguments, which makes it unsuitable for @file{.asm} file
  1518  processing.  @samp{./configure} will detect the problem and either abort or
  1519  choose another m4 in the @env{PATH}.  The bug is fixed in FreeBSD 9.3 and 10.0,
  1520  so either upgrade or use GNU m4.  Note that the FreeBSD package system installs
  1521  GNU m4 under the name @samp{gm4}, which GMP cannot guess.
  1522  
  1523  @item FreeBSD 7.x, 8.x, 9.x
  1524  @cindex FreeBSD
  1525  GMP releases starting with 6.0 do not support @samp{ABI=32} on FreeBSD/amd64
  1526  prior to release 10.0 of the system.  The cause is a broken @code{limits.h},
  1527  which GMP no longer works around.
  1528  
  1529  @item MS-DOS and MS Windows
  1530  @cindex MS-DOS
  1531  @cindex MS Windows
  1532  @cindex Windows
  1533  @cindex Cygwin
  1534  @cindex DJGPP
  1535  @cindex MINGW
  1536  On an MS-DOS system DJGPP can be used to build GMP, and on an MS Windows
  1537  system Cygwin, DJGPP and MINGW can be used.  All three are excellent ports of
  1538  GCC and the various GNU tools.
  1539  
  1540  @display
  1541  @uref{http://www.cygwin.com/}
  1542  @uref{http://www.delorie.com/djgpp/}
  1543  @uref{http://www.mingw.org/}
  1544  @end display
  1545  
  1546  @cindex Interix
  1547  @cindex Services for Unix
  1548  Microsoft also publishes an Interix ``Services for Unix'' which can be used to
  1549  build GMP on Windows (with a normal @samp{./configure}), but it's not free
  1550  software.
  1551  
  1552  @item MS Windows DLLs
  1553  @cindex DLLs
  1554  @cindex MS Windows
  1555  @cindex Windows
  1556  On systems @samp{*-*-cygwin*}, @samp{*-*-mingw*} and @samp{*-*-pw32*} by
  1557  default GMP builds only a static library, but a DLL can be built instead using
  1558  
  1559  @example
  1560  ./configure --disable-static --enable-shared
  1561  @end example
  1562  
  1563  Static and DLL libraries can't both be built, since certain export directives
  1564  in @file{gmp.h} must be different.
  1565  
  1566  A MINGW DLL build of GMP can be used with Microsoft C@.  Libtool doesn't
  1567  install a @file{.lib} format import library, but it can be created with MS
  1568  @command{lib} as follows, and copied to the install directory.  Similarly for
  1569  @file{libmp} and @file{libgmpxx}.
  1570  
  1571  @example
  1572  cd .libs
  1573  lib /def:libgmp-3.dll.def /out:libgmp-3.lib
  1574  @end example
  1575  
  1576  MINGW uses the C runtime library @samp{msvcrt.dll} for I/O, so applications
  1577  wanting to use the GMP I/O routines must be compiled with @samp{cl /MD} to do
  1578  the same.  If one of the other C runtime library choices provided by MS C is
  1579  desired then the suggestion is to use the GMP string functions and confine I/O
  1580  to the application.
  1581  
  1582  @item Motorola 68k CPU Types
  1583  @cindex 68000
  1584  @samp{m68k} is taken to mean 68000.  @samp{m68020} or higher will give a
  1585  performance boost on applicable CPUs.  @samp{m68360} can be used for CPU32
  1586  series chips.  @samp{m68302} can be used for ``Dragonball'' series chips,
  1587  though this is merely a synonym for @samp{m68000}.
  1588  
  1589  @item NetBSD 5.x
  1590  @cindex NetBSD
  1591  @command{m4} in these releases of NetBSD has an eval function which ignores its
  1592  2nd and 3rd arguments, which makes it unsuitable for @file{.asm} file
  1593  processing.  @samp{./configure} will detect the problem and either abort or
  1594  choose another m4 in the @env{PATH}.  The bug is fixed in NetBSD 6, so either
  1595  upgrade or use GNU m4.  Note that the NetBSD package system installs GNU m4
  1596  under the name @samp{gm4}, which GMP cannot guess.
  1597  
  1598  @item OpenBSD 2.6
  1599  @cindex OpenBSD
  1600  @command{m4} in this release of OpenBSD has a bug in @code{eval} that makes it
  1601  unsuitable for @file{.asm} file processing.  @samp{./configure} will detect
  1602  the problem and either abort or choose another m4 in the @env{PATH}.  The bug
  1603  is fixed in OpenBSD 2.7, so either upgrade or use GNU m4.
  1604  
  1605  @item Power CPU Types
  1606  @cindex Power/PowerPC
  1607  In GMP, CPU types @samp{power*} and @samp{powerpc*} will each use instructions
  1608  not available on the other, so it's important to choose the right one for the
  1609  CPU that will be used.  Currently GMP has no assembly code support for using
  1610  just the common instruction subset.  To get executables that run on both, the
  1611  current suggestion is to use the generic C code (@option{--disable-assembly}),
  1612  possibly with appropriate compiler options (like @samp{-mcpu=common} for
  1613  @command{gcc}).  CPU @samp{rs6000} (which is not a CPU but a family of
  1614  workstations) is accepted by @file{config.sub}, but is currently equivalent to
  1615  @option{--disable-assembly}.
  1616  
  1617  @item Sparc CPU Types
  1618  @cindex Sparc
  1619  @samp{sparcv8} or @samp{supersparc} on relevant systems will give a
  1620  significant performance increase over the V7 code selected by plain
  1621  @samp{sparc}.
  1622  
  1623  @item Sparc App Regs
  1624  @cindex Sparc
  1625  The GMP assembly code for both 32-bit and 64-bit Sparc clobbers the
  1626  ``application registers'' @code{g2}, @code{g3} and @code{g4}, the same way
  1627  that the GCC default @samp{-mapp-regs} does (@pxref{SPARC Options,, SPARC
  1628  Options, gcc, Using the GNU Compiler Collection (GCC)}).
  1629  
  1630  This makes that code unsuitable for use with the special V9
  1631  @samp{-mcmodel=embmedany} (which uses @code{g4} as a data segment pointer), and
  1632  for applications wanting to use those registers for special purposes.  In these
  1633  cases the only suggestion currently is to build GMP with
  1634  @option{--disable-assembly} to avoid the assembly code.
  1635  
  1636  @item SunOS 4
  1637  @cindex SunOS
  1638  @command{/usr/bin/m4} lacks various features needed to process @file{.asm}
  1639  files, and instead @samp{./configure} will automatically use
  1640  @command{/usr/5bin/m4}, which we believe is always available (if not then use
  1641  GNU m4).
  1642  
  1643  @item x86 CPU Types
  1644  @cindex x86
  1645  @cindex 80x86
  1646  @cindex i386
  1647  @samp{i586}, @samp{pentium} or @samp{pentiummmx} code is good for its intended
  1648  P5 Pentium chips, but quite slow when run on Intel P6 class chips (PPro, P-II,
  1649  P-III)@.  @samp{i386} is a better choice when making binaries that must run on
  1650  both.
  1651  
  1652  @item x86 MMX and SSE2 Code
  1653  @cindex MMX
  1654  @cindex SSE2
  1655  If the CPU selected has MMX code but the assembler doesn't support it, a
  1656  warning is given and non-MMX code is used instead.  This will be an inferior
  1657  build, since the MMX code that's present is there because it's faster than the
  1658  corresponding plain integer code.  The same applies to SSE2.
  1659  
  1660  Old versions of @samp{gas} don't support MMX instructions, in particular
  1661  version 1.92.3 that comes with FreeBSD 2.2.8 or the more recent OpenBSD 3.1
  1662  doesn't.
  1663  
  1664  Solaris 2.6 and 2.7 @command{as} generate incorrect object code for register
  1665  to register @code{movq} instructions, and so can't be used for MMX code.
  1666  Install a recent @command{gas} if MMX code is wanted on these systems.
  1667  @end table
  1668  
  1669  
  1670  @need 2000
  1671  @node Known Build Problems, Performance optimization, Notes for Particular Systems, Installing GMP
  1672  @section Known Build Problems
  1673  @cindex Build problems known
  1674  
  1675  @c This section is more or less meant for known build problems that are not
  1676  @c otherwise worked around and require some sort of manual intervention.
  1677  
  1678  You might find more up-to-date information at @uref{https://gmplib.org/}.
  1679  
  1680  @table @asis
  1681  @item Compiler link options
  1682  The version of libtool currently in use rather aggressively strips compiler
  1683  options when linking a shared library.  This will hopefully be relaxed in the
  1684  future, but for now if this is a problem the suggestion is to create a little
  1685  script to hide them, and for instance configure with
  1686  
  1687  @example
  1688  ./configure CC=gcc-with-my-options
  1689  @end example
  1690  
  1691  @item DJGPP (@samp{*-*-msdosdjgpp*})
  1692  @cindex DJGPP
  1693  The DJGPP port of @command{bash} 2.03 is unable to run the @samp{configure}
  1694  script, it exits silently, having died writing a preamble to
  1695  @file{config.log}.  Use @command{bash} 2.04 or higher.
  1696  
  1697  @samp{make all} was found to run out of memory during the final
  1698  @file{libgmp.la} link on one system tested, despite having 64Mb available.
  1699  Running @samp{make libgmp.la} directly helped, perhaps recursing into the
  1700  various subdirectories uses up memory.
  1701  
  1702  @item GNU binutils @command{strip} prior to 2.12
  1703  @cindex Stripped libraries
  1704  @cindex Binutils @command{strip}
  1705  @cindex GNU @command{strip}
  1706  @command{strip} from GNU binutils 2.11 and earlier should not be used on the
  1707  static libraries @file{libgmp.a} and @file{libmp.a} since it will discard all
  1708  but the last of multiple archive members with the same name, like the three
  1709  versions of @file{init.o} in @file{libgmp.a}.  Binutils 2.12 or higher can be
  1710  used successfully.
  1711  
  1712  The shared libraries @file{libgmp.so} and @file{libmp.so} are not affected by
  1713  this and any version of @command{strip} can be used on them.
  1714  
  1715  @item @command{make} syntax error
  1716  @cindex SCO
  1717  @cindex IRIX
  1718  On certain versions of SCO OpenServer 5 and IRIX 6.5 the native @command{make}
  1719  is unable to handle the long dependencies list for @file{libgmp.la}.  The
  1720  symptom is a ``syntax error'' on the following line of the top-level
  1721  @file{Makefile}.
  1722  
  1723  @example
  1724  libgmp.la: $(libgmp_la_OBJECTS) $(libgmp_la_DEPENDENCIES)
  1725  @end example
  1726  
  1727  Either use GNU Make, or as a workaround remove
  1728  @code{$(libgmp_la_DEPENDENCIES)} from that line (which will make the initial
  1729  build work, but if any recompiling is done @file{libgmp.la} might not be
  1730  rebuilt).
  1731  
  1732  @item MacOS X (@samp{*-*-darwin*})
  1733  @cindex MacOS X
  1734  @cindex Darwin
  1735  Libtool currently only knows how to create shared libraries on MacOS X using
  1736  the native @command{cc} (which is a modified GCC), not a plain GCC@.  A
  1737  static-only build should work though (@samp{--disable-shared}).
  1738  
  1739  @item NeXT prior to 3.3
  1740  @cindex NeXT
  1741  The system compiler on old versions of NeXT was a massacred and old GCC, even
  1742  if it called itself @file{cc}.  This compiler cannot be used to build GMP, you
  1743  need to get a real GCC, and install that.  (NeXT may have fixed this in
  1744  release 3.3 of their system.)
  1745  
  1746  @item POWER and PowerPC
  1747  @cindex Power/PowerPC
  1748  Bugs in GCC 2.7.2 (and 2.6.3) mean it can't be used to compile GMP on POWER or
  1749  PowerPC@.  If you want to use GCC for these machines, get GCC 2.7.2.1 (or
  1750  later).
  1751  
  1752  @item Sequent Symmetry
  1753  @cindex Sequent Symmetry
  1754  Use the GNU assembler instead of the system assembler, since the latter has
  1755  serious bugs.
  1756  
  1757  @item Solaris 2.6
  1758  @cindex Solaris
  1759  The system @command{sed} prints an error ``Output line too long'' when libtool
  1760  builds @file{libgmp.la}.  This doesn't seem to cause any obvious ill effects,
  1761  but GNU @command{sed} is recommended, to avoid any doubt.
  1762  
  1763  @item Sparc Solaris 2.7 with gcc 2.95.2 in @samp{ABI=32}
  1764  @cindex Solaris
  1765  A shared library build of GMP seems to fail in this combination, it builds but
  1766  then fails the tests, apparently due to some incorrect data relocations within
  1767  @code{gmp_randinit_lc_2exp_size}.  The exact cause is unknown,
  1768  @samp{--disable-shared} is recommended.
  1769  @end table
  1770  
  1771  
  1772  @need 2000
  1773  @node Performance optimization, , Known Build Problems, Installing GMP
  1774  @section Performance optimization
  1775  @cindex Optimizing performance
  1776  
  1777  @c At some point, this should perhaps move to a separate chapter on optimizing
  1778  @c performance.
  1779  
  1780  For optimal performance, build GMP for the exact CPU type of the target
  1781  computer, see @ref{Build Options}.
  1782  
  1783  Unlike what is the case for most other programs, the compiler typically
  1784  doesn't matter much, since GMP uses assembly language for the most critical
  1785  operation.
  1786  
  1787  In particular for long-running GMP applications, and applications demanding
  1788  extremely large numbers, building and running the @code{tuneup} program in the
  1789  @file{tune} subdirectory, can be important.  For example,
  1790  
  1791  @example
  1792  cd tune
  1793  make tuneup
  1794  ./tuneup
  1795  @end example
  1796  
  1797  will generate better contents for the @file{gmp-mparam.h} parameter file.
  1798  
  1799  To use the results, put the output in the file indicated in the
  1800  @samp{Parameters for ...} header.  Then recompile from scratch.
  1801  
  1802  The @code{tuneup} program takes one useful parameter, @samp{-f NNN}, which
  1803  instructs the program how long to check FFT multiply parameters.  If you're
  1804  going to use GMP for extremely large numbers, you may want to run @code{tuneup}
  1805  with a large NNN value.
  1806  
  1807  
  1808  @node GMP Basics, Reporting Bugs, Installing GMP, Top
  1809  @comment  node-name,  next,  previous,  up
  1810  @chapter GMP Basics
  1811  @cindex Basics
  1812  
  1813  @strong{Using functions, macros, data types, etc.@: not documented in this
  1814  manual is strongly discouraged.  If you do so your application is guaranteed
  1815  to be incompatible with future versions of GMP.}
  1816  
  1817  @menu
  1818  * Headers and Libraries::
  1819  * Nomenclature and Types::
  1820  * Function Classes::
  1821  * Variable Conventions::
  1822  * Parameter Conventions::
  1823  * Memory Management::
  1824  * Reentrancy::
  1825  * Useful Macros and Constants::
  1826  * Compatibility with older versions::
  1827  * Demonstration Programs::
  1828  * Efficiency::
  1829  * Debugging::
  1830  * Profiling::
  1831  * Autoconf::
  1832  * Emacs::
  1833  @end menu
  1834  
  1835  @node Headers and Libraries, Nomenclature and Types, GMP Basics, GMP Basics
  1836  @section Headers and Libraries
  1837  @cindex Headers
  1838  
  1839  @cindex @file{gmp.h}
  1840  @cindex Include files
  1841  @cindex @code{#include}
  1842  All declarations needed to use GMP are collected in the include file
  1843  @file{gmp.h}.  It is designed to work with both C and C++ compilers.
  1844  
  1845  @example
  1846  #include <gmp.h>
  1847  @end example
  1848  
  1849  @cindex @code{stdio.h}
  1850  Note however that prototypes for GMP functions with @code{FILE *} parameters
  1851  are only provided if @code{<stdio.h>} is included too.
  1852  
  1853  @example
  1854  #include <stdio.h>
  1855  #include <gmp.h>
  1856  @end example
  1857  
  1858  @cindex @code{stdarg.h}
  1859  Likewise @code{<stdarg.h>} is required for prototypes with @code{va_list}
  1860  parameters, such as @code{gmp_vprintf}.  And @code{<obstack.h>} for prototypes
  1861  with @code{struct obstack} parameters, such as @code{gmp_obstack_printf}, when
  1862  available.
  1863  
  1864  @cindex Libraries
  1865  @cindex Linking
  1866  @cindex @code{libgmp}
  1867  All programs using GMP must link against the @file{libgmp} library.  On a
  1868  typical Unix-like system this can be done with @samp{-lgmp}, for example
  1869  
  1870  @example
  1871  gcc myprogram.c -lgmp
  1872  @end example
  1873  
  1874  @cindex @code{libgmpxx}
  1875  GMP C++ functions are in a separate @file{libgmpxx} library.  This is built
  1876  and installed if C++ support has been enabled (@pxref{Build Options}).  For
  1877  example,
  1878  
  1879  @example
  1880  g++ mycxxprog.cc -lgmpxx -lgmp
  1881  @end example
  1882  
  1883  @cindex Libtool
  1884  GMP is built using Libtool and an application can use that to link if desired,
  1885  @GMPpxreftop{libtool, GNU Libtool}.
  1886  
  1887  If GMP has been installed to a non-standard location then it may be necessary
  1888  to use @samp{-I} and @samp{-L} compiler options to point to the right
  1889  directories, and some sort of run-time path for a shared library.
  1890  
  1891  
  1892  @node Nomenclature and Types, Function Classes, Headers and Libraries, GMP Basics
  1893  @section Nomenclature and Types
  1894  @cindex Nomenclature
  1895  @cindex Types
  1896  
  1897  @cindex Integer
  1898  @tindex @code{mpz_t}
  1899  In this manual, @dfn{integer} usually means a multiple precision integer, as
  1900  defined by the GMP library.  The C data type for such integers is @code{mpz_t}.
  1901  Here are some examples of how to declare such integers:
  1902  
  1903  @example
  1904  mpz_t sum;
  1905  
  1906  struct foo @{ mpz_t x, y; @};
  1907  
  1908  mpz_t vec[20];
  1909  @end example
  1910  
  1911  @cindex Rational number
  1912  @tindex @code{mpq_t}
  1913  @dfn{Rational number} means a multiple precision fraction.  The C data type
  1914  for these fractions is @code{mpq_t}.  For example:
  1915  
  1916  @example
  1917  mpq_t quotient;
  1918  @end example
  1919  
  1920  @cindex Floating-point number
  1921  @tindex @code{mpf_t}
  1922  @dfn{Floating point number} or @dfn{Float} for short, is an arbitrary precision
  1923  mantissa with a limited precision exponent.  The C data type for such objects
  1924  is @code{mpf_t}.  For example:
  1925  
  1926  @example
  1927  mpf_t fp;
  1928  @end example
  1929  
  1930  @tindex @code{mp_exp_t}
  1931  The floating point functions accept and return exponents in the C type
  1932  @code{mp_exp_t}.  Currently this is usually a @code{long}, but on some systems
  1933  it's an @code{int} for efficiency.
  1934  
  1935  @cindex Limb
  1936  @tindex @code{mp_limb_t}
  1937  A @dfn{limb} means the part of a multi-precision number that fits in a single
  1938  machine word.  (We chose this word because a limb of the human body is
  1939  analogous to a digit, only larger, and containing several digits.)  Normally a
  1940  limb is 32 or 64 bits.  The C data type for a limb is @code{mp_limb_t}.
  1941  
  1942  @tindex @code{mp_size_t}
  1943  Counts of limbs of a multi-precision number represented in the C type
  1944  @code{mp_size_t}.  Currently this is normally a @code{long}, but on some
  1945  systems it's an @code{int} for efficiency, and on some systems it will be
  1946  @code{long long} in the future.
  1947  
  1948  @tindex @code{mp_bitcnt_t}
  1949  Counts of bits of a multi-precision number are represented in the C type
  1950  @code{mp_bitcnt_t}.  Currently this is always an @code{unsigned long}, but on
  1951  some systems it will be an @code{unsigned long long} in the future.
  1952  
  1953  @cindex Random state
  1954  @tindex @code{gmp_randstate_t}
  1955  @dfn{Random state} means an algorithm selection and current state data.  The C
  1956  data type for such objects is @code{gmp_randstate_t}.  For example:
  1957  
  1958  @example
  1959  gmp_randstate_t rstate;
  1960  @end example
  1961  
  1962  Also, in general @code{mp_bitcnt_t} is used for bit counts and ranges, and
  1963  @code{size_t} is used for byte or character counts.
  1964  
  1965  
  1966  @node Function Classes, Variable Conventions, Nomenclature and Types, GMP Basics
  1967  @section Function Classes
  1968  @cindex Function classes
  1969  
  1970  There are six classes of functions in the GMP library:
  1971  
  1972  @enumerate
  1973  @item
  1974  Functions for signed integer arithmetic, with names beginning with
  1975  @code{mpz_}.  The associated type is @code{mpz_t}.  There are about 150
  1976  functions in this class.  (@pxref{Integer Functions})
  1977  
  1978  @item
  1979  Functions for rational number arithmetic, with names beginning with
  1980  @code{mpq_}.  The associated type is @code{mpq_t}.  There are about 35
  1981  functions in this class, but the integer functions can be used for arithmetic
  1982  on the numerator and denominator separately.  (@pxref{Rational Number
  1983  Functions})
  1984  
  1985  @item
  1986  Functions for floating-point arithmetic, with names beginning with
  1987  @code{mpf_}.  The associated type is @code{mpf_t}.  There are about 70
  1988  functions is this class.  (@pxref{Floating-point Functions})
  1989  
  1990  @item
  1991  Fast low-level functions that operate on natural numbers.  These are used by
  1992  the functions in the preceding groups, and you can also call them directly
  1993  from very time-critical user programs.  These functions' names begin with
  1994  @code{mpn_}.  The associated type is array of @code{mp_limb_t}.  There are
  1995  about 60 (hard-to-use) functions in this class.  (@pxref{Low-level Functions})
  1996  
  1997  @item
  1998  Miscellaneous functions.  Functions for setting up custom allocation and
  1999  functions for generating random numbers.  (@pxref{Custom Allocation}, and
  2000  @pxref{Random Number Functions})
  2001  @end enumerate
  2002  
  2003  
  2004  @node Variable Conventions, Parameter Conventions, Function Classes, GMP Basics
  2005  @section Variable Conventions
  2006  @cindex Variable conventions
  2007  @cindex Conventions for variables
  2008  
  2009  GMP functions generally have output arguments before input arguments.  This
  2010  notation is by analogy with the assignment operator.  The BSD MP compatibility
  2011  functions are exceptions, having the output arguments last.
  2012  
  2013  GMP lets you use the same variable for both input and output in one call.  For
  2014  example, the main function for integer multiplication, @code{mpz_mul}, can be
  2015  used to square @code{x} and put the result back in @code{x} with
  2016  
  2017  @example
  2018  mpz_mul (x, x, x);
  2019  @end example
  2020  
  2021  Before you can assign to a GMP variable, you need to initialize it by calling
  2022  one of the special initialization functions.  When you're done with a
  2023  variable, you need to clear it out, using one of the functions for that
  2024  purpose.  Which function to use depends on the type of variable.  See the
  2025  chapters on integer functions, rational number functions, and floating-point
  2026  functions for details.
  2027  
  2028  A variable should only be initialized once, or at least cleared between each
  2029  initialization.  After a variable has been initialized, it may be assigned to
  2030  any number of times.
  2031  
  2032  For efficiency reasons, avoid excessive initializing and clearing.  In
  2033  general, initialize near the start of a function and clear near the end.  For
  2034  example,
  2035  
  2036  @example
  2037  void
  2038  foo (void)
  2039  @{
  2040    mpz_t  n;
  2041    int    i;
  2042    mpz_init (n);
  2043    for (i = 1; i < 100; i++)
  2044      @{
  2045        mpz_mul (n, @dots{});
  2046        mpz_fdiv_q (n, @dots{});
  2047        @dots{}
  2048      @}
  2049    mpz_clear (n);
  2050  @}
  2051  @end example
  2052  
  2053  
  2054  @node Parameter Conventions, Memory Management, Variable Conventions, GMP Basics
  2055  @section Parameter Conventions
  2056  @cindex Parameter conventions
  2057  @cindex Conventions for parameters
  2058  
  2059  When a GMP variable is used as a function parameter, it's effectively a
  2060  call-by-reference, meaning if the function stores a value there it will change
  2061  the original in the caller.  Parameters which are input-only can be designated
  2062  @code{const} to provoke a compiler error or warning on attempting to modify
  2063  them.
  2064  
  2065  When a function is going to return a GMP result, it should designate a
  2066  parameter that it sets, like the library functions do.  More than one value
  2067  can be returned by having more than one output parameter, again like the
  2068  library functions.  A @code{return} of an @code{mpz_t} etc doesn't return the
  2069  object, only a pointer, and this is almost certainly not what's wanted.
  2070  
  2071  Here's an example accepting an @code{mpz_t} parameter, doing a calculation,
  2072  and storing the result to the indicated parameter.
  2073  
  2074  @example
  2075  void
  2076  foo (mpz_t result, const mpz_t param, unsigned long n)
  2077  @{
  2078    unsigned long  i;
  2079    mpz_mul_ui (result, param, n);
  2080    for (i = 1; i < n; i++)
  2081      mpz_add_ui (result, result, i*7);
  2082  @}
  2083  
  2084  int
  2085  main (void)
  2086  @{
  2087    mpz_t  r, n;
  2088    mpz_init (r);
  2089    mpz_init_set_str (n, "123456", 0);
  2090    foo (r, n, 20L);
  2091    gmp_printf ("%Zd\n", r);
  2092    return 0;
  2093  @}
  2094  @end example
  2095  
  2096  @code{foo} works even if the mainline passes the same variable for
  2097  @code{param} and @code{result}, just like the library functions.  But
  2098  sometimes it's tricky to make that work, and an application might not want to
  2099  bother supporting that sort of thing.
  2100  
  2101  For interest, the GMP types @code{mpz_t} etc are implemented as one-element
  2102  arrays of certain structures.  This is why declaring a variable creates an
  2103  object with the fields GMP needs, but then using it as a parameter passes a
  2104  pointer to the object.  Note that the actual fields in each @code{mpz_t} etc
  2105  are for internal use only and should not be accessed directly by code that
  2106  expects to be compatible with future GMP releases.
  2107  
  2108  
  2109  @need 1000
  2110  @node Memory Management, Reentrancy, Parameter Conventions, GMP Basics
  2111  @section Memory Management
  2112  @cindex Memory management
  2113  
  2114  The GMP types like @code{mpz_t} are small, containing only a couple of sizes,
  2115  and pointers to allocated data.  Once a variable is initialized, GMP takes
  2116  care of all space allocation.  Additional space is allocated whenever a
  2117  variable doesn't have enough.
  2118  
  2119  @code{mpz_t} and @code{mpq_t} variables never reduce their allocated space.
  2120  Normally this is the best policy, since it avoids frequent reallocation.
  2121  Applications that need to return memory to the heap at some particular point
  2122  can use @code{mpz_realloc2}, or clear variables no longer needed.
  2123  
  2124  @code{mpf_t} variables, in the current implementation, use a fixed amount of
  2125  space, determined by the chosen precision and allocated at initialization, so
  2126  their size doesn't change.
  2127  
  2128  All memory is allocated using @code{malloc} and friends by default, but this
  2129  can be changed, see @ref{Custom Allocation}.  Temporary memory on the stack is
  2130  also used (via @code{alloca}), but this can be changed at build-time if
  2131  desired, see @ref{Build Options}.
  2132  
  2133  
  2134  @node Reentrancy, Useful Macros and Constants, Memory Management, GMP Basics
  2135  @section Reentrancy
  2136  @cindex Reentrancy
  2137  @cindex Thread safety
  2138  @cindex Multi-threading
  2139  
  2140  @noindent
  2141  GMP is reentrant and thread-safe, with some exceptions:
  2142  
  2143  @itemize @bullet
  2144  @item
  2145  If configured with @option{--enable-alloca=malloc-notreentrant} (or with
  2146  @option{--enable-alloca=notreentrant} when @code{alloca} is not available),
  2147  then naturally GMP is not reentrant.
  2148  
  2149  @item
  2150  @code{mpf_set_default_prec} and @code{mpf_init} use a global variable for the
  2151  selected precision.  @code{mpf_init2} can be used instead, and in the C++
  2152  interface an explicit precision to the @code{mpf_class} constructor.
  2153  
  2154  @item
  2155  @code{mpz_random} and the other old random number functions use a global
  2156  random state and are hence not reentrant.  The newer random number functions
  2157  that accept a @code{gmp_randstate_t} parameter can be used instead.
  2158  
  2159  @item
  2160  @code{gmp_randinit} (obsolete) returns an error indication through a global
  2161  variable, which is not thread safe.  Applications are advised to use
  2162  @code{gmp_randinit_default} or @code{gmp_randinit_lc_2exp} instead.
  2163  
  2164  @item
  2165  @code{mp_set_memory_functions} uses global variables to store the selected
  2166  memory allocation functions.
  2167  
  2168  @item
  2169  If the memory allocation functions set by a call to
  2170  @code{mp_set_memory_functions} (or @code{malloc} and friends by default) are
  2171  not reentrant, then GMP will not be reentrant either.
  2172  
  2173  @item
  2174  If the standard I/O functions such as @code{fwrite} are not reentrant then the
  2175  GMP I/O functions using them will not be reentrant either.
  2176  
  2177  @item
  2178  It's safe for two threads to read from the same GMP variable simultaneously,
  2179  but it's not safe for one to read while another might be writing, nor for
  2180  two threads to write simultaneously.  It's not safe for two threads to
  2181  generate a random number from the same @code{gmp_randstate_t} simultaneously,
  2182  since this involves an update of that variable.
  2183  @end itemize
  2184  
  2185  
  2186  @need 2000
  2187  @node Useful Macros and Constants, Compatibility with older versions, Reentrancy, GMP Basics
  2188  @section Useful Macros and Constants
  2189  @cindex Useful macros and constants
  2190  @cindex Constants
  2191  
  2192  @deftypevr {Global Constant} {const int} mp_bits_per_limb
  2193  @findex mp_bits_per_limb
  2194  @cindex Bits per limb
  2195  @cindex Limb size
  2196  The number of bits per limb.
  2197  @end deftypevr
  2198  
  2199  @defmac __GNU_MP_VERSION
  2200  @defmacx __GNU_MP_VERSION_MINOR
  2201  @defmacx __GNU_MP_VERSION_PATCHLEVEL
  2202  @cindex Version number
  2203  @cindex GMP version number
  2204  The major and minor GMP version, and patch level, respectively, as integers.
  2205  For GMP i.j, these numbers will be i, j, and 0, respectively.
  2206  For GMP i.j.k, these numbers will be i, j, and k, respectively.
  2207  @end defmac
  2208  
  2209  @deftypevr {Global Constant} {const char * const} gmp_version
  2210  @findex gmp_version
  2211  The GMP version number, as a null-terminated string, in the form ``i.j.k''.
  2212  This release is @nicode{"@value{VERSION}"}.  Note that the format ``i.j'' was
  2213  used, before version 4.3.0, when k was zero.
  2214  @end deftypevr
  2215  
  2216  @defmac __GMP_CC
  2217  @defmacx __GMP_CFLAGS
  2218  The compiler and compiler flags, respectively, used when compiling GMP, as
  2219  strings.
  2220  @end defmac
  2221  
  2222  
  2223  @node Compatibility with older versions, Demonstration Programs, Useful Macros and Constants, GMP Basics
  2224  @section Compatibility with older versions
  2225  @cindex Compatibility with older versions
  2226  @cindex Past GMP versions
  2227  @cindex Upward compatibility
  2228  
  2229  This version of GMP is upwardly binary compatible with all 5.x, 4.x, and 3.x
  2230  versions, and upwardly compatible at the source level with all 2.x versions,
  2231  with the following exceptions.
  2232  
  2233  @itemize @bullet
  2234  @item
  2235  @code{mpn_gcd} had its source arguments swapped as of GMP 3.0, for consistency
  2236  with other @code{mpn} functions.
  2237  
  2238  @item
  2239  @code{mpf_get_prec} counted precision slightly differently in GMP 3.0 and
  2240  3.0.1, but in 3.1 reverted to the 2.x style.
  2241  
  2242  @item
  2243  @code{mpn_bdivmod}, documented as preliminary in GMP 4, has been removed.
  2244  @end itemize
  2245  
  2246  There are a number of compatibility issues between GMP 1 and GMP 2 that of
  2247  course also apply when porting applications from GMP 1 to GMP 5.  Please
  2248  see the GMP 2 manual for details.
  2249  
  2250  @c @item Integer division functions round the result differently.  The obsolete
  2251  @c functions (@code{mpz_div}, @code{mpz_divmod}, @code{mpz_mdiv},
  2252  @c @code{mpz_mdivmod}, etc) now all use floor rounding (i.e., they round the
  2253  @c quotient towards
  2254  @c @ifinfo
  2255  @c @minus{}infinity).
  2256  @c @end ifinfo
  2257  @c @iftex
  2258  @c @tex
  2259  @c $-\infty$).
  2260  @c @end tex
  2261  @c @end iftex
  2262  @c There are a lot of functions for integer division, giving the user better
  2263  @c control over the rounding.
  2264  
  2265  @c @item The function @code{mpz_mod} now compute the true @strong{mod} function.
  2266  
  2267  @c @item The functions @code{mpz_powm} and @code{mpz_powm_ui} now use
  2268  @c @strong{mod} for reduction.
  2269  
  2270  @c @item The assignment functions for rational numbers do no longer canonicalize
  2271  @c their results.  In the case a non-canonical result could arise from an
  2272  @c assignment, the user need to insert an explicit call to
  2273  @c @code{mpq_canonicalize}.  This change was made for efficiency.
  2274  
  2275  @c @item Output generated by @code{mpz_out_raw} in this release cannot be read
  2276  @c by @code{mpz_inp_raw} in previous releases.  This change was made for making
  2277  @c the file format truly portable between machines with different word sizes.
  2278  
  2279  @c @item Several @code{mpn} functions have changed.  But they were intentionally
  2280  @c undocumented in previous releases.
  2281  
  2282  @c @item The functions @code{mpz_cmp_ui}, @code{mpz_cmp_si}, and @code{mpq_cmp_ui}
  2283  @c are now implemented as macros, and thereby sometimes evaluate their
  2284  @c arguments multiple times.
  2285  
  2286  @c @item The functions @code{mpz_pow_ui} and @code{mpz_ui_pow_ui} now yield 1
  2287  @c for 0^0.  (In version 1, they yielded 0.)
  2288  
  2289  @c In version 1 of the library, @code{mpq_set_den} handled negative
  2290  @c denominators by copying the sign to the numerator.  That is no longer done.
  2291  
  2292  @c Pure assignment functions do not canonicalize the assigned variable.  It is
  2293  @c the responsibility of the user to canonicalize the assigned variable before
  2294  @c any arithmetic operations are performed on that variable.
  2295  @c Note that this is an incompatible change from version 1 of the library.
  2296  
  2297  @c @end enumerate
  2298  
  2299  
  2300  @need 1000
  2301  @node Demonstration Programs, Efficiency, Compatibility with older versions, GMP Basics
  2302  @section Demonstration programs
  2303  @cindex Demonstration programs
  2304  @cindex Example programs
  2305  @cindex Sample programs
  2306  The @file{demos} subdirectory has some sample programs using GMP@.  These
  2307  aren't built or installed, but there's a @file{Makefile} with rules for them.
  2308  For instance,
  2309  
  2310  @example
  2311  make pexpr
  2312  ./pexpr 68^975+10
  2313  @end example
  2314  
  2315  @noindent
  2316  The following programs are provided
  2317  
  2318  @itemize @bullet
  2319  @item
  2320  @cindex Expression parsing demo
  2321  @cindex Parsing expressions demo
  2322  @samp{pexpr} is an expression evaluator, the program used on the GMP web page.
  2323  @item
  2324  @cindex Expression parsing demo
  2325  @cindex Parsing expressions demo
  2326  The @samp{calc} subdirectory has a similar but simpler evaluator using
  2327  @command{lex} and @command{yacc}.
  2328  @item
  2329  @cindex Expression parsing demo
  2330  @cindex Parsing expressions demo
  2331  The @samp{expr} subdirectory is yet another expression evaluator, a library
  2332  designed for ease of use within a C program.  See @file{demos/expr/README} for
  2333  more information.
  2334  @item
  2335  @cindex Factorization demo
  2336  @samp{factorize} is a Pollard-Rho factorization program.
  2337  @item
  2338  @samp{isprime} is a command-line interface to the @code{mpz_probab_prime_p}
  2339  function.
  2340  @item
  2341  @samp{primes} counts or lists primes in an interval, using a sieve.
  2342  @item
  2343  @samp{qcn} is an example use of @code{mpz_kronecker_ui} to estimate quadratic
  2344  class numbers.
  2345  @item
  2346  @cindex @code{perl}
  2347  @cindex GMP Perl module
  2348  @cindex Perl module
  2349  The @samp{perl} subdirectory is a comprehensive perl interface to GMP@.  See
  2350  @file{demos/perl/INSTALL} for more information.  Documentation is in POD
  2351  format in @file{demos/perl/GMP.pm}.
  2352  @end itemize
  2353  
  2354  As an aside, consideration has been given at various times to some sort of
  2355  expression evaluation within the main GMP library.  Going beyond something
  2356  minimal quickly leads to matters like user-defined functions, looping, fixnums
  2357  for control variables, etc, which are considered outside the scope of GMP
  2358  (much closer to language interpreters or compilers, @xref{Language Bindings}.)
  2359  Something simple for program input convenience may yet be a possibility, a
  2360  combination of the @file{expr} demo and the @file{pexpr} tree back-end
  2361  perhaps.  But for now the above evaluators are offered as illustrations.
  2362  
  2363  
  2364  @need 1000
  2365  @node Efficiency, Debugging, Demonstration Programs, GMP Basics
  2366  @section Efficiency
  2367  @cindex Efficiency
  2368  
  2369  @table @asis
  2370  @item Small Operands
  2371  @cindex Small operands
  2372  On small operands, the time for function call overheads and memory allocation
  2373  can be significant in comparison to actual calculation.  This is unavoidable
  2374  in a general purpose variable precision library, although GMP attempts to be
  2375  as efficient as it can on both large and small operands.
  2376  
  2377  @item Static Linking
  2378  @cindex Static linking
  2379  On some CPUs, in particular the x86s, the static @file{libgmp.a} should be
  2380  used for maximum speed, since the PIC code in the shared @file{libgmp.so} will
  2381  have a small overhead on each function call and global data address.  For many
  2382  programs this will be insignificant, but for long calculations there's a gain
  2383  to be had.
  2384  
  2385  @item Initializing and Clearing
  2386  @cindex Initializing and clearing
  2387  Avoid excessive initializing and clearing of variables, since this can be
  2388  quite time consuming, especially in comparison to otherwise fast operations
  2389  like addition.
  2390  
  2391  A language interpreter might want to keep a free list or stack of
  2392  initialized variables ready for use.  It should be possible to integrate
  2393  something like that with a garbage collector too.
  2394  
  2395  @item Reallocations
  2396  @cindex Reallocations
  2397  An @code{mpz_t} or @code{mpq_t} variable used to hold successively increasing
  2398  values will have its memory repeatedly @code{realloc}ed, which could be quite
  2399  slow or could fragment memory, depending on the C library.  If an application
  2400  can estimate the final size then @code{mpz_init2} or @code{mpz_realloc2} can
  2401  be called to allocate the necessary space from the beginning
  2402  (@pxref{Initializing Integers}).
  2403  
  2404  It doesn't matter if a size set with @code{mpz_init2} or @code{mpz_realloc2}
  2405  is too small, since all functions will do a further reallocation if necessary.
  2406  Badly overestimating memory required will waste space though.
  2407  
  2408  @item @code{2exp} Functions
  2409  @cindex @code{2exp} functions
  2410  It's up to an application to call functions like @code{mpz_mul_2exp} when
  2411  appropriate.  General purpose functions like @code{mpz_mul} make no attempt to
  2412  identify powers of two or other special forms, because such inputs will
  2413  usually be very rare and testing every time would be wasteful.
  2414  
  2415  @item @code{ui} and @code{si} Functions
  2416  @cindex @code{ui} and @code{si} functions
  2417  The @code{ui} functions and the small number of @code{si} functions exist for
  2418  convenience and should be used where applicable.  But if for example an
  2419  @code{mpz_t} contains a value that fits in an @code{unsigned long} there's no
  2420  need extract it and call a @code{ui} function, just use the regular @code{mpz}
  2421  function.
  2422  
  2423  @item In-Place Operations
  2424  @cindex In-place operations
  2425  @code{mpz_abs}, @code{mpq_abs}, @code{mpf_abs}, @code{mpz_neg}, @code{mpq_neg}
  2426  and @code{mpf_neg} are fast when used for in-place operations like
  2427  @code{mpz_abs(x,x)}, since in the current implementation only a single field
  2428  of @code{x} needs changing.  On suitable compilers (GCC for instance) this is
  2429  inlined too.
  2430  
  2431  @code{mpz_add_ui}, @code{mpz_sub_ui}, @code{mpf_add_ui} and @code{mpf_sub_ui}
  2432  benefit from an in-place operation like @code{mpz_add_ui(x,x,y)}, since
  2433  usually only one or two limbs of @code{x} will need to be changed.  The same
  2434  applies to the full precision @code{mpz_add} etc if @code{y} is small.  If
  2435  @code{y} is big then cache locality may be helped, but that's all.
  2436  
  2437  @code{mpz_mul} is currently the opposite, a separate destination is slightly
  2438  better.  A call like @code{mpz_mul(x,x,y)} will, unless @code{y} is only one
  2439  limb, make a temporary copy of @code{x} before forming the result.  Normally
  2440  that copying will only be a tiny fraction of the time for the multiply, so
  2441  this is not a particularly important consideration.
  2442  
  2443  @code{mpz_set}, @code{mpq_set}, @code{mpq_set_num}, @code{mpf_set}, etc, make
  2444  no attempt to recognise a copy of something to itself, so a call like
  2445  @code{mpz_set(x,x)} will be wasteful.  Naturally that would never be written
  2446  deliberately, but if it might arise from two pointers to the same object then
  2447  a test to avoid it might be desirable.
  2448  
  2449  @example
  2450  if (x != y)
  2451    mpz_set (x, y);
  2452  @end example
  2453  
  2454  Note that it's never worth introducing extra @code{mpz_set} calls just to get
  2455  in-place operations.  If a result should go to a particular variable then just
  2456  direct it there and let GMP take care of data movement.
  2457  
  2458  @item Divisibility Testing (Small Integers)
  2459  @cindex Divisibility testing
  2460  @code{mpz_divisible_ui_p} and @code{mpz_congruent_ui_p} are the best functions
  2461  for testing whether an @code{mpz_t} is divisible by an individual small
  2462  integer.  They use an algorithm which is faster than @code{mpz_tdiv_ui}, but
  2463  which gives no useful information about the actual remainder, only whether
  2464  it's zero (or a particular value).
  2465  
  2466  However when testing divisibility by several small integers, it's best to take
  2467  a remainder modulo their product, to save multi-precision operations.  For
  2468  instance to test whether a number is divisible by any of 23, 29 or 31 take a
  2469  remainder modulo @math{23@times{}29@times{}31 = 20677} and then test that.
  2470  
  2471  The division functions like @code{mpz_tdiv_q_ui} which give a quotient as well
  2472  as a remainder are generally a little slower than the remainder-only functions
  2473  like @code{mpz_tdiv_ui}.  If the quotient is only rarely wanted then it's
  2474  probably best to just take a remainder and then go back and calculate the
  2475  quotient if and when it's wanted (@code{mpz_divexact_ui} can be used if the
  2476  remainder is zero).
  2477  
  2478  @item Rational Arithmetic
  2479  @cindex Rational arithmetic
  2480  The @code{mpq} functions operate on @code{mpq_t} values with no common factors
  2481  in the numerator and denominator.  Common factors are checked-for and cast out
  2482  as necessary.  In general, cancelling factors every time is the best approach
  2483  since it minimizes the sizes for subsequent operations.
  2484  
  2485  However, applications that know something about the factorization of the
  2486  values they're working with might be able to avoid some of the GCDs used for
  2487  canonicalization, or swap them for divisions.  For example when multiplying by
  2488  a prime it's enough to check for factors of it in the denominator instead of
  2489  doing a full GCD@.  Or when forming a big product it might be known that very
  2490  little cancellation will be possible, and so canonicalization can be left to
  2491  the end.
  2492  
  2493  The @code{mpq_numref} and @code{mpq_denref} macros give access to the
  2494  numerator and denominator to do things outside the scope of the supplied
  2495  @code{mpq} functions.  @xref{Applying Integer Functions}.
  2496  
  2497  The canonical form for rationals allows mixed-type @code{mpq_t} and integer
  2498  additions or subtractions to be done directly with multiples of the
  2499  denominator.  This will be somewhat faster than @code{mpq_add}.  For example,
  2500  
  2501  @example
  2502  /* mpq increment */
  2503  mpz_add (mpq_numref(q), mpq_numref(q), mpq_denref(q));
  2504  
  2505  /* mpq += unsigned long */
  2506  mpz_addmul_ui (mpq_numref(q), mpq_denref(q), 123UL);
  2507  
  2508  /* mpq -= mpz */
  2509  mpz_submul (mpq_numref(q), mpq_denref(q), z);
  2510  @end example
  2511  
  2512  @item Number Sequences
  2513  @cindex Number sequences
  2514  Functions like @code{mpz_fac_ui}, @code{mpz_fib_ui} and @code{mpz_bin_uiui}
  2515  are designed for calculating isolated values.  If a range of values is wanted
  2516  it's probably best to call to get a starting point and iterate from there.
  2517  
  2518  @item Text Input/Output
  2519  @cindex Text input/output
  2520  Hexadecimal or octal are suggested for input or output in text form.
  2521  Power-of-2 bases like these can be converted much more efficiently than other
  2522  bases, like decimal.  For big numbers there's usually nothing of particular
  2523  interest to be seen in the digits, so the base doesn't matter much.
  2524  
  2525  Maybe we can hope octal will one day become the normal base for everyday use,
  2526  as proposed by King Charles XII of Sweden and later reformers.
  2527  @c Reference: Knuth volume 2 section 4.1, page 184 of second edition.  :-)
  2528  @end table
  2529  
  2530  
  2531  @node Debugging, Profiling, Efficiency, GMP Basics
  2532  @section Debugging
  2533  @cindex Debugging
  2534  
  2535  @table @asis
  2536  @item Stack Overflow
  2537  @cindex Stack overflow
  2538  @cindex Segmentation violation
  2539  @cindex Bus error
  2540  Depending on the system, a segmentation violation or bus error might be the
  2541  only indication of stack overflow.  See @samp{--enable-alloca} choices in
  2542  @ref{Build Options}, for how to address this.
  2543  
  2544  In new enough versions of GCC, @samp{-fstack-check} may be able to ensure an
  2545  overflow is recognised by the system before too much damage is done, or
  2546  @samp{-fstack-limit-symbol} or @samp{-fstack-limit-register} may be able to
  2547  add checking if the system itself doesn't do any (@pxref{Code Gen Options,,
  2548  Options for Code Generation, gcc, Using the GNU Compiler Collection (GCC)}).
  2549  These options must be added to the @samp{CFLAGS} used in the GMP build
  2550  (@pxref{Build Options}), adding them just to an application will have no
  2551  effect.  Note also they're a slowdown, adding overhead to each function call
  2552  and each stack allocation.
  2553  
  2554  @item Heap Problems
  2555  @cindex Heap problems
  2556  @cindex Malloc problems
  2557  The most likely cause of application problems with GMP is heap corruption.
  2558  Failing to @code{init} GMP variables will have unpredictable effects, and
  2559  corruption arising elsewhere in a program may well affect GMP@.  Initializing
  2560  GMP variables more than once or failing to clear them will cause memory leaks.
  2561  
  2562  @cindex Malloc debugger
  2563  In all such cases a @code{malloc} debugger is recommended.  On a GNU or BSD
  2564  system the standard C library @code{malloc} has some diagnostic facilities,
  2565  see @ref{Allocation Debugging,, Allocation Debugging, libc, The GNU C Library
  2566  Reference Manual}, or @samp{man 3 malloc}.  Other possibilities, in no
  2567  particular order, include
  2568  
  2569  @display
  2570  @uref{http://www.inf.ethz.ch/personal/biere/projects/ccmalloc/}
  2571  @uref{http://dmalloc.com/}
  2572  @uref{http://www.perens.com/FreeSoftware/} @ (electric fence)
  2573  @uref{http://packages.debian.org/stable/devel/fda}
  2574  @uref{http://www.gnupdate.org/components/leakbug/}
  2575  @uref{http://people.redhat.com/~otaylor/memprof/}
  2576  @uref{http://www.cbmamiga.demon.co.uk/mpatrol/}
  2577  @end display
  2578  
  2579  The GMP default allocation routines in @file{memory.c} also have a simple
  2580  sentinel scheme which can be enabled with @code{#define DEBUG} in that file.
  2581  This is mainly designed for detecting buffer overruns during GMP development,
  2582  but might find other uses.
  2583  
  2584  @item Stack Backtraces
  2585  @cindex Stack backtrace
  2586  On some systems the compiler options GMP uses by default can interfere with
  2587  debugging.  In particular on x86 and 68k systems @samp{-fomit-frame-pointer}
  2588  is used and this generally inhibits stack backtracing.  Recompiling without
  2589  such options may help while debugging, though the usual caveats about it
  2590  potentially moving a memory problem or hiding a compiler bug will apply.
  2591  
  2592  @item GDB, the GNU Debugger
  2593  @cindex GDB
  2594  @cindex GNU Debugger
  2595  A sample @file{.gdbinit} is included in the distribution, showing how to call
  2596  some undocumented dump functions to print GMP variables from within GDB@.  Note
  2597  that these functions shouldn't be used in final application code since they're
  2598  undocumented and may be subject to incompatible changes in future versions of
  2599  GMP.
  2600  
  2601  @item Source File Paths
  2602  GMP has multiple source files with the same name, in different directories.
  2603  For example @file{mpz}, @file{mpq} and @file{mpf} each have an
  2604  @file{init.c}.  If the debugger can't already determine the right one it may
  2605  help to build with absolute paths on each C file.  One way to do that is to
  2606  use a separate object directory with an absolute path to the source directory.
  2607  
  2608  @example
  2609  cd /my/build/dir
  2610  /my/source/dir/gmp-@value{VERSION}/configure
  2611  @end example
  2612  
  2613  This works via @code{VPATH}, and might require GNU @command{make}.
  2614  Alternately it might be possible to change the @code{.c.lo} rules
  2615  appropriately.
  2616  
  2617  @item Assertion Checking
  2618  @cindex Assertion checking
  2619  The build option @option{--enable-assert} is available to add some consistency
  2620  checks to the library (see @ref{Build Options}).  These are likely to be of
  2621  limited value to most applications.  Assertion failures are just as likely to
  2622  indicate memory corruption as a library or compiler bug.
  2623  
  2624  Applications using the low-level @code{mpn} functions, however, will benefit
  2625  from @option{--enable-assert} since it adds checks on the parameters of most
  2626  such functions, many of which have subtle restrictions on their usage.  Note
  2627  however that only the generic C code has checks, not the assembly code, so
  2628  @option{--disable-assembly} should be used for maximum checking.
  2629  
  2630  @item Temporary Memory Checking
  2631  The build option @option{--enable-alloca=debug} arranges that each block of
  2632  temporary memory in GMP is allocated with a separate call to @code{malloc} (or
  2633  the allocation function set with @code{mp_set_memory_functions}).
  2634  
  2635  This can help a malloc debugger detect accesses outside the intended bounds,
  2636  or detect memory not released.  In a normal build, on the other hand,
  2637  temporary memory is allocated in blocks which GMP divides up for its own use,
  2638  or may be allocated with a compiler builtin @code{alloca} which will go
  2639  nowhere near any malloc debugger hooks.
  2640  
  2641  @item Maximum Debuggability
  2642  To summarize the above, a GMP build for maximum debuggability would be
  2643  
  2644  @example
  2645  ./configure --disable-shared --enable-assert \
  2646    --enable-alloca=debug --disable-assembly CFLAGS=-g
  2647  @end example
  2648  
  2649  For C++, add @samp{--enable-cxx CXXFLAGS=-g}.
  2650  
  2651  @item Checker
  2652  @cindex Checker
  2653  @cindex GCC Checker
  2654  The GCC checker (@uref{https://savannah.nongnu.org/projects/checker/}) can be
  2655  used with GMP@.  It contains a stub library which means GMP applications
  2656  compiled with checker can use a normal GMP build.
  2657  
  2658  A build of GMP with checking within GMP itself can be made.  This will run
  2659  very very slowly.  On GNU/Linux for example,
  2660  
  2661  @cindex @command{checkergcc}
  2662  @example
  2663  ./configure --disable-assembly CC=checkergcc
  2664  @end example
  2665  
  2666  @option{--disable-assembly} must be used, since the GMP assembly code doesn't
  2667  support the checking scheme.  The GMP C++ features cannot be used, since
  2668  current versions of checker (0.9.9.1) don't yet support the standard C++
  2669  library.
  2670  
  2671  @item Valgrind
  2672  @cindex Valgrind
  2673  Valgrind (@uref{http://valgrind.org/}) is a memory checker for x86, ARM, MIPS,
  2674  PowerPC, and S/390.  It translates and emulates machine instructions to do
  2675  strong checks for uninitialized data (at the level of individual bits), memory
  2676  accesses through bad pointers, and memory leaks.
  2677  
  2678  Valgrind does not always support every possible instruction, in particular
  2679  ones recently added to an ISA.  Valgrind might therefore be incompatible with
  2680  a recent GMP or even a less recent GMP which is compiled using a recent GCC.
  2681  
  2682  GMP's assembly code sometimes promotes a read of the limbs to some larger size,
  2683  for efficiency.  GMP will do this even at the start and end of a multilimb
  2684  operand, using naturally aligned operations on the larger type.  This may lead
  2685  to benign reads outside of allocated areas, triggering complaints from
  2686  Valgrind.  Valgrind's option @samp{--partial-loads-ok=yes} should help.
  2687  
  2688  @item Other Problems
  2689  Any suspected bug in GMP itself should be isolated to make sure it's not an
  2690  application problem, see @ref{Reporting Bugs}.
  2691  @end table
  2692  
  2693  
  2694  @node Profiling, Autoconf, Debugging, GMP Basics
  2695  @section Profiling
  2696  @cindex Profiling
  2697  @cindex Execution profiling
  2698  @cindex @code{--enable-profiling}
  2699  
  2700  Running a program under a profiler is a good way to find where it's spending
  2701  most time and where improvements can be best sought.  The profiling choices
  2702  for a GMP build are as follows.
  2703  
  2704  @table @asis
  2705  @item @samp{--disable-profiling}
  2706  The default is to add nothing special for profiling.
  2707  
  2708  It should be possible to just compile the mainline of a program with @code{-p}
  2709  and use @command{prof} to get a profile consisting of timer-based sampling of
  2710  the program counter.  Most of the GMP assembly code has the necessary symbol
  2711  information.
  2712  
  2713  This approach has the advantage of minimizing interference with normal program
  2714  operation, but on most systems the resolution of the sampling is quite low (10
  2715  milliseconds for instance), requiring long runs to get accurate information.
  2716  
  2717  @item @samp{--enable-profiling=prof}
  2718  @cindex @code{prof}
  2719  Build with support for the system @command{prof}, which means @samp{-p} added
  2720  to the @samp{CFLAGS}.
  2721  
  2722  This provides call counting in addition to program counter sampling, which
  2723  allows the most frequently called routines to be identified, and an average
  2724  time spent in each routine to be determined.
  2725  
  2726  The x86 assembly code has support for this option, but on other processors
  2727  the assembly routines will be as if compiled without @samp{-p} and therefore
  2728  won't appear in the call counts.
  2729  
  2730  On some systems, such as GNU/Linux, @samp{-p} in fact means @samp{-pg} and in
  2731  this case @samp{--enable-profiling=gprof} described below should be used
  2732  instead.
  2733  
  2734  @item @samp{--enable-profiling=gprof}
  2735  @cindex @code{gprof}
  2736  Build with support for @command{gprof}, which means @samp{-pg} added to the
  2737  @samp{CFLAGS}.
  2738  
  2739  This provides call graph construction in addition to call counting and program
  2740  counter sampling, which makes it possible to count calls coming from different
  2741  locations.  For example the number of calls to @code{mpn_mul} from
  2742  @code{mpz_mul} versus the number from @code{mpf_mul}.  The program counter
  2743  sampling is still flat though, so only a total time in @code{mpn_mul} would be
  2744  accumulated, not a separate amount for each call site.
  2745  
  2746  The x86 assembly code has support for this option, but on other processors
  2747  the assembly routines will be as if compiled without @samp{-pg} and therefore
  2748  not be included in the call counts.
  2749  
  2750  On x86 and m68k systems @samp{-pg} and @samp{-fomit-frame-pointer} are
  2751  incompatible, so the latter is omitted from the default flags in that case,
  2752  which might result in poorer code generation.
  2753  
  2754  Incidentally, it should be possible to use the @command{gprof} program with a
  2755  plain @samp{--enable-profiling=prof} build.  But in that case only the
  2756  @samp{gprof -p} flat profile and call counts can be expected to be valid, not
  2757  the @samp{gprof -q} call graph.
  2758  
  2759  @item @samp{--enable-profiling=instrument}
  2760  @cindex @code{-finstrument-functions}
  2761  @cindex @code{instrument-functions}
  2762  Build with the GCC option @samp{-finstrument-functions} added to the
  2763  @samp{CFLAGS} (@pxref{Code Gen Options,, Options for Code Generation, gcc,
  2764  Using the GNU Compiler Collection (GCC)}).
  2765  
  2766  This inserts special instrumenting calls at the start and end of each
  2767  function, allowing exact timing and full call graph construction.
  2768  
  2769  This instrumenting is not normally a standard system feature and will require
  2770  support from an external library, such as
  2771  
  2772  @cindex FunctionCheck
  2773  @cindex fnccheck
  2774  @display
  2775  @uref{http://sourceforge.net/projects/fnccheck/}
  2776  @end display
  2777  
  2778  This should be included in @samp{LIBS} during the GMP configure so that test
  2779  programs will link.  For example,
  2780  
  2781  @example
  2782  ./configure --enable-profiling=instrument LIBS=-lfc
  2783  @end example
  2784  
  2785  On a GNU system the C library provides dummy instrumenting functions, so
  2786  programs compiled with this option will link.  In this case it's only
  2787  necessary to ensure the correct library is added when linking an application.
  2788  
  2789  The x86 assembly code supports this option, but on other processors the
  2790  assembly routines will be as if compiled without
  2791  @samp{-finstrument-functions} meaning time spent in them will effectively be
  2792  attributed to their caller.
  2793  @end table
  2794  
  2795  
  2796  @node Autoconf, Emacs, Profiling, GMP Basics
  2797  @section Autoconf
  2798  @cindex Autoconf
  2799  
  2800  Autoconf based applications can easily check whether GMP is installed.  The
  2801  only thing to be noted is that GMP library symbols from version 3 onwards have
  2802  prefixes like @code{__gmpz}.  The following therefore would be a simple test,
  2803  
  2804  @cindex @code{AC_CHECK_LIB}
  2805  @example
  2806  AC_CHECK_LIB(gmp, __gmpz_init)
  2807  @end example
  2808  
  2809  This just uses the default @code{AC_CHECK_LIB} actions for found or not found,
  2810  but an application that must have GMP would want to generate an error if not
  2811  found.  For example,
  2812  
  2813  @example
  2814  AC_CHECK_LIB(gmp, __gmpz_init, ,
  2815    [AC_MSG_ERROR([GNU MP not found, see https://gmplib.org/])])
  2816  @end example
  2817  
  2818  If functions added in some particular version of GMP are required, then one of
  2819  those can be used when checking.  For example @code{mpz_mul_si} was added in
  2820  GMP 3.1,
  2821  
  2822  @example
  2823  AC_CHECK_LIB(gmp, __gmpz_mul_si, ,
  2824    [AC_MSG_ERROR(
  2825    [GNU MP not found, or not 3.1 or up, see https://gmplib.org/])])
  2826  @end example
  2827  
  2828  An alternative would be to test the version number in @file{gmp.h} using say
  2829  @code{AC_EGREP_CPP}.  That would make it possible to test the exact version,
  2830  if some particular sub-minor release is known to be necessary.
  2831  
  2832  In general it's recommended that applications should simply demand a new
  2833  enough GMP rather than trying to provide supplements for features not
  2834  available in past versions.
  2835  
  2836  Occasionally an application will need or want to know the size of a type at
  2837  configuration or preprocessing time, not just with @code{sizeof} in the code.
  2838  This can be done in the normal way with @code{mp_limb_t} etc, but GMP 4.0 or
  2839  up is best for this, since prior versions needed certain @samp{-D} defines on
  2840  systems using a @code{long long} limb.  The following would suit Autoconf 2.50
  2841  or up,
  2842  
  2843  @example
  2844  AC_CHECK_SIZEOF(mp_limb_t, , [#include <gmp.h>])
  2845  @end example
  2846  
  2847  
  2848  @node Emacs,  , Autoconf, GMP Basics
  2849  @section Emacs
  2850  @cindex Emacs
  2851  @cindex @code{info-lookup-symbol}
  2852  
  2853  @key{C-h C-i} (@code{info-lookup-symbol}) is a good way to find documentation
  2854  on C functions while editing (@pxref{Info Lookup, , Info Documentation Lookup,
  2855  emacs, The Emacs Editor}).
  2856  
  2857  The GMP manual can be included in such lookups by putting the following in
  2858  your @file{.emacs},
  2859  
  2860  @c  This isn't pretty, but there doesn't seem to be a better way (in emacs
  2861  @c  21.2 at least).  info-lookup->mode-value could be used for the "assoc"s,
  2862  @c  but that function isn't documented, whereas info-lookup-alist is.
  2863  @c
  2864  @example
  2865  (eval-after-load "info-look"
  2866    '(let ((mode-value (assoc 'c-mode (assoc 'symbol info-lookup-alist))))
  2867       (setcar (nthcdr 3 mode-value)
  2868               (cons '("(gmp)Function Index" nil "^ -.* " "\\>")
  2869                     (nth 3 mode-value)))))
  2870  @end example
  2871  
  2872  
  2873  @node Reporting Bugs, Integer Functions, GMP Basics, Top
  2874  @comment  node-name,  next,  previous,  up
  2875  @chapter Reporting Bugs
  2876  @cindex Reporting bugs
  2877  @cindex Bug reporting
  2878  
  2879  If you think you have found a bug in the GMP library, please investigate it
  2880  and report it.  We have made this library available to you, and it is not too
  2881  much to ask you to report the bugs you find.
  2882  
  2883  Before you report a bug, check it's not already addressed in @ref{Known Build
  2884  Problems}, or perhaps @ref{Notes for Particular Systems}.  You may also want
  2885  to check @uref{https://gmplib.org/} for patches for this release.
  2886  
  2887  Please include the following in any report,
  2888  
  2889  @itemize @bullet
  2890  @item
  2891  The GMP version number, and if pre-packaged or patched then say so.
  2892  
  2893  @item
  2894  A test program that makes it possible for us to reproduce the bug.  Include
  2895  instructions on how to run the program.
  2896  
  2897  @item
  2898  A description of what is wrong.  If the results are incorrect, in what way.
  2899  If you get a crash, say so.
  2900  
  2901  @item
  2902  If you get a crash, include a stack backtrace from the debugger if it's
  2903  informative (@samp{where} in @command{gdb}, or @samp{$C} in @command{adb}).
  2904  
  2905  @item
  2906  Please do not send core dumps, executables or @command{strace}s.
  2907  
  2908  @item
  2909  The @samp{configure} options you used when building GMP, if any.
  2910  
  2911  @item
  2912  The output from @samp{configure}, as printed to stdout, with any options used.
  2913  
  2914  @item
  2915  The name of the compiler and its version.  For @command{gcc}, get the version
  2916  with @samp{gcc -v}, otherwise perhaps @samp{what `which cc`}, or similar.
  2917  
  2918  @item
  2919  The output from running @samp{uname -a}.
  2920  
  2921  @item
  2922  The output from running @samp{./config.guess}, and from running
  2923  @samp{./configfsf.guess} (might be the same).
  2924  
  2925  @item
  2926  If the bug is related to @samp{configure}, then the compressed contents of
  2927  @file{config.log}.
  2928  
  2929  @item
  2930  If the bug is related to an @file{asm} file not assembling, then the contents
  2931  of @file{config.m4} and the offending line or lines from the temporary
  2932  @file{mpn/tmp-<file>.s}.
  2933  @end itemize
  2934  
  2935  Please make an effort to produce a self-contained report, with something
  2936  definite that can be tested or debugged.  Vague queries or piecemeal messages
  2937  are difficult to act on and don't help the development effort.
  2938  
  2939  It is not uncommon that an observed problem is actually due to a bug in the
  2940  compiler; the GMP code tends to explore interesting corners in compilers.
  2941  
  2942  If your bug report is good, we will do our best to help you get a corrected
  2943  version of the library; if the bug report is poor, we won't do anything about
  2944  it (except maybe ask you to send a better report).
  2945  
  2946  Send your report to: @email{gmp-bugs@@gmplib.org}.
  2947  
  2948  If you think something in this manual is unclear, or downright incorrect, or if
  2949  the language needs to be improved, please send a note to the same address.
  2950  
  2951  
  2952  @node Integer Functions, Rational Number Functions, Reporting Bugs, Top
  2953  @comment  node-name,  next,  previous,  up
  2954  @chapter Integer Functions
  2955  @cindex Integer functions
  2956  
  2957  This chapter describes the GMP functions for performing integer arithmetic.
  2958  These functions start with the prefix @code{mpz_}.
  2959  
  2960  GMP integers are stored in objects of type @code{mpz_t}.
  2961  
  2962  @menu
  2963  * Initializing Integers::
  2964  * Assigning Integers::
  2965  * Simultaneous Integer Init & Assign::
  2966  * Converting Integers::
  2967  * Integer Arithmetic::
  2968  * Integer Division::
  2969  * Integer Exponentiation::
  2970  * Integer Roots::
  2971  * Number Theoretic Functions::
  2972  * Integer Comparisons::
  2973  * Integer Logic and Bit Fiddling::
  2974  * I/O of Integers::
  2975  * Integer Random Numbers::
  2976  * Integer Import and Export::
  2977  * Miscellaneous Integer Functions::
  2978  * Integer Special Functions::
  2979  @end menu
  2980  
  2981  @node Initializing Integers, Assigning Integers, Integer Functions, Integer Functions
  2982  @comment  node-name,  next,  previous,  up
  2983  @section Initialization Functions
  2984  @cindex Integer initialization functions
  2985  @cindex Initialization functions
  2986  
  2987  The functions for integer arithmetic assume that all integer objects are
  2988  initialized.  You do that by calling the function @code{mpz_init}.  For
  2989  example,
  2990  
  2991  @example
  2992  @{
  2993    mpz_t integ;
  2994    mpz_init (integ);
  2995    @dots{}
  2996    mpz_add (integ, @dots{});
  2997    @dots{}
  2998    mpz_sub (integ, @dots{});
  2999  
  3000    /* Unless the program is about to exit, do ... */
  3001    mpz_clear (integ);
  3002  @}
  3003  @end example
  3004  
  3005  As you can see, you can store new values any number of times, once an
  3006  object is initialized.
  3007  
  3008  @deftypefun void mpz_init (mpz_t @var{x})
  3009  Initialize @var{x}, and set its value to 0.
  3010  @end deftypefun
  3011  
  3012  @deftypefun void mpz_inits (mpz_t @var{x}, ...)
  3013  Initialize a NULL-terminated list of @code{mpz_t} variables, and set their
  3014  values to 0.
  3015  @end deftypefun
  3016  
  3017  @deftypefun void mpz_init2 (mpz_t @var{x}, mp_bitcnt_t @var{n})
  3018  Initialize @var{x}, with space for @var{n}-bit numbers, and set its value to 0.
  3019  Calling this function instead of @code{mpz_init} or @code{mpz_inits} is never
  3020  necessary; reallocation is handled automatically by GMP when needed.
  3021  
  3022  While @var{n} defines the initial space, @var{x} will grow automatically in the
  3023  normal way, if necessary, for subsequent values stored.  @code{mpz_init2} makes
  3024  it possible to avoid such reallocations if a maximum size is known in advance.
  3025  
  3026  In preparation for an operation, GMP often allocates one limb more than
  3027  ultimately needed.  To make sure GMP will not perform reallocation for
  3028  @var{x}, you need to add the number of bits in @code{mp_limb_t} to @var{n}.
  3029  @end deftypefun
  3030  
  3031  @deftypefun void mpz_clear (mpz_t @var{x})
  3032  Free the space occupied by @var{x}.  Call this function for all @code{mpz_t}
  3033  variables when you are done with them.
  3034  @end deftypefun
  3035  
  3036  @deftypefun void mpz_clears (mpz_t @var{x}, ...)
  3037  Free the space occupied by a NULL-terminated list of @code{mpz_t} variables.
  3038  @end deftypefun
  3039  
  3040  @deftypefun void mpz_realloc2 (mpz_t @var{x}, mp_bitcnt_t @var{n})
  3041  Change the space allocated for @var{x} to @var{n} bits.  The value in @var{x}
  3042  is preserved if it fits, or is set to 0 if not.
  3043  
  3044  Calling this function is never necessary; reallocation is handled automatically
  3045  by GMP when needed.  But this function can be used to increase the space for a
  3046  variable in order to avoid repeated automatic reallocations, or to decrease it
  3047  to give memory back to the heap.
  3048  @end deftypefun
  3049  
  3050  
  3051  @node Assigning Integers, Simultaneous Integer Init & Assign, Initializing Integers, Integer Functions
  3052  @comment  node-name,  next,  previous,  up
  3053  @section Assignment Functions
  3054  @cindex Integer assignment functions
  3055  @cindex Assignment functions
  3056  
  3057  These functions assign new values to already initialized integers
  3058  (@pxref{Initializing Integers}).
  3059  
  3060  @deftypefun void mpz_set (mpz_t @var{rop}, const mpz_t @var{op})
  3061  @deftypefunx void mpz_set_ui (mpz_t @var{rop}, unsigned long int @var{op})
  3062  @deftypefunx void mpz_set_si (mpz_t @var{rop}, signed long int @var{op})
  3063  @deftypefunx void mpz_set_d (mpz_t @var{rop}, double @var{op})
  3064  @deftypefunx void mpz_set_q (mpz_t @var{rop}, const mpq_t @var{op})
  3065  @deftypefunx void mpz_set_f (mpz_t @var{rop}, const mpf_t @var{op})
  3066  Set the value of @var{rop} from @var{op}.
  3067  
  3068  @code{mpz_set_d}, @code{mpz_set_q} and @code{mpz_set_f} truncate @var{op} to
  3069  make it an integer.
  3070  @end deftypefun
  3071  
  3072  @deftypefun int mpz_set_str (mpz_t @var{rop}, const char *@var{str}, int @var{base})
  3073  Set the value of @var{rop} from @var{str}, a null-terminated C string in base
  3074  @var{base}.  White space is allowed in the string, and is simply ignored.
  3075  
  3076  The @var{base} may vary from 2 to 62, or if @var{base} is 0, then the leading
  3077  characters are used: @code{0x} and @code{0X} for hexadecimal, @code{0b} and
  3078  @code{0B} for binary, @code{0} for octal, or decimal otherwise.
  3079  
  3080  For bases up to 36, case is ignored; upper-case and lower-case letters have
  3081  the same value.  For bases 37 to 62, upper-case letter represent the usual
  3082  10..35 while lower-case letter represent 36..61.
  3083  
  3084  This function returns 0 if the entire string is a valid number in base
  3085  @var{base}.  Otherwise it returns @minus{}1.
  3086  @c
  3087  @c  It turns out that it is not entirely true that this function ignores
  3088  @c  white-space.  It does ignore it between digits, but not after a minus sign
  3089  @c  or within or after ``0x''.  Some thought was given to disallowing all
  3090  @c  whitespace, but that would be an incompatible change, whitespace has been
  3091  @c  documented as ignored ever since GMP 1.
  3092  @c
  3093  @end deftypefun
  3094  
  3095  @deftypefun void mpz_swap (mpz_t @var{rop1}, mpz_t @var{rop2})
  3096  Swap the values @var{rop1} and @var{rop2} efficiently.
  3097  @end deftypefun
  3098  
  3099  
  3100  @node Simultaneous Integer Init & Assign, Converting Integers, Assigning Integers, Integer Functions
  3101  @comment  node-name,  next,  previous,  up
  3102  @section Combined Initialization and Assignment Functions
  3103  @cindex Integer assignment functions
  3104  @cindex Assignment functions
  3105  @cindex Integer initialization functions
  3106  @cindex Initialization functions
  3107  
  3108  For convenience, GMP provides a parallel series of initialize-and-set functions
  3109  which initialize the output and then store the value there.  These functions'
  3110  names have the form @code{mpz_init_set@dots{}}
  3111  
  3112  Here is an example of using one:
  3113  
  3114  @example
  3115  @{
  3116    mpz_t pie;
  3117    mpz_init_set_str (pie, "3141592653589793238462643383279502884", 10);
  3118    @dots{}
  3119    mpz_sub (pie, @dots{});
  3120    @dots{}
  3121    mpz_clear (pie);
  3122  @}
  3123  @end example
  3124  
  3125  @noindent
  3126  Once the integer has been initialized by any of the @code{mpz_init_set@dots{}}
  3127  functions, it can be used as the source or destination operand for the ordinary
  3128  integer functions.  Don't use an initialize-and-set function on a variable
  3129  already initialized!
  3130  
  3131  @deftypefun void mpz_init_set (mpz_t @var{rop}, const mpz_t @var{op})
  3132  @deftypefunx void mpz_init_set_ui (mpz_t @var{rop}, unsigned long int @var{op})
  3133  @deftypefunx void mpz_init_set_si (mpz_t @var{rop}, signed long int @var{op})
  3134  @deftypefunx void mpz_init_set_d (mpz_t @var{rop}, double @var{op})
  3135  Initialize @var{rop} with limb space and set the initial numeric value from
  3136  @var{op}.
  3137  @end deftypefun
  3138  
  3139  @deftypefun int mpz_init_set_str (mpz_t @var{rop}, const char *@var{str}, int @var{base})
  3140  Initialize @var{rop} and set its value like @code{mpz_set_str} (see its
  3141  documentation above for details).
  3142  
  3143  If the string is a correct base @var{base} number, the function returns 0;
  3144  if an error occurs it returns @minus{}1.  @var{rop} is initialized even if
  3145  an error occurs.  (I.e., you have to call @code{mpz_clear} for it.)
  3146  @end deftypefun
  3147  
  3148  
  3149  @node Converting Integers, Integer Arithmetic, Simultaneous Integer Init & Assign, Integer Functions
  3150  @comment  node-name,  next,  previous,  up
  3151  @section Conversion Functions
  3152  @cindex Integer conversion functions
  3153  @cindex Conversion functions
  3154  
  3155  This section describes functions for converting GMP integers to standard C
  3156  types.  Functions for converting @emph{to} GMP integers are described in
  3157  @ref{Assigning Integers} and @ref{I/O of Integers}.
  3158  
  3159  @deftypefun {unsigned long int} mpz_get_ui (const mpz_t @var{op})
  3160  Return the value of @var{op} as an @code{unsigned long}.
  3161  
  3162  If @var{op} is too big to fit an @code{unsigned long} then just the least
  3163  significant bits that do fit are returned.  The sign of @var{op} is ignored,
  3164  only the absolute value is used.
  3165  @end deftypefun
  3166  
  3167  @deftypefun {signed long int} mpz_get_si (const mpz_t @var{op})
  3168  If @var{op} fits into a @code{signed long int} return the value of @var{op}.
  3169  Otherwise return the least significant part of @var{op}, with the same sign
  3170  as @var{op}.
  3171  
  3172  If @var{op} is too big to fit in a @code{signed long int}, the returned
  3173  result is probably not very useful.  To find out if the value will fit, use
  3174  the function @code{mpz_fits_slong_p}.
  3175  @end deftypefun
  3176  
  3177  @deftypefun double mpz_get_d (const mpz_t @var{op})
  3178  Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding
  3179  towards zero).
  3180  
  3181  If the exponent from the conversion is too big, the result is system
  3182  dependent.  An infinity is returned where available.  A hardware overflow trap
  3183  may or may not occur.
  3184  @end deftypefun
  3185  
  3186  @deftypefun double mpz_get_d_2exp (signed long int *@var{exp}, const mpz_t @var{op})
  3187  Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding
  3188  towards zero), and returning the exponent separately.
  3189  
  3190  The return value is in the range @math{0.5@le{}@GMPabs{@var{d}}<1} and the
  3191  exponent is stored to @code{*@var{exp}}.  @m{@var{d} * 2^{exp}, @var{d} *
  3192  2^@var{exp}} is the (truncated) @var{op} value.  If @var{op} is zero, the
  3193  return is @math{0.0} and 0 is stored to @code{*@var{exp}}.
  3194  
  3195  @cindex @code{frexp}
  3196  This is similar to the standard C @code{frexp} function (@pxref{Normalization
  3197  Functions,,, libc, The GNU C Library Reference Manual}).
  3198  @end deftypefun
  3199  
  3200  @deftypefun {char *} mpz_get_str (char *@var{str}, int @var{base}, const mpz_t @var{op})
  3201  Convert @var{op} to a string of digits in base @var{base}.  The base argument
  3202  may vary from 2 to 62 or from @minus{}2 to @minus{}36.
  3203  
  3204  For @var{base} in the range 2..36, digits and lower-case letters are used; for
  3205  @minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62,
  3206  digits, upper-case letters, and lower-case letters (in that significance order)
  3207  are used.
  3208  
  3209  If @var{str} is @code{NULL}, the result string is allocated using the current
  3210  allocation function (@pxref{Custom Allocation}).  The block will be
  3211  @code{strlen(str)+1} bytes, that being exactly enough for the string and
  3212  null-terminator.
  3213  
  3214  If @var{str} is not @code{NULL}, it should point to a block of storage large
  3215  enough for the result, that being @code{mpz_sizeinbase (@var{op}, @var{base})
  3216  + 2}.  The two extra bytes are for a possible minus sign, and the
  3217  null-terminator.
  3218  
  3219  A pointer to the result string is returned, being either the allocated block,
  3220  or the given @var{str}.
  3221  @end deftypefun
  3222  
  3223  
  3224  @need 2000
  3225  @node Integer Arithmetic, Integer Division, Converting Integers, Integer Functions
  3226  @comment  node-name,  next,  previous,  up
  3227  @section Arithmetic Functions
  3228  @cindex Integer arithmetic functions
  3229  @cindex Arithmetic functions
  3230  
  3231  @deftypefun void mpz_add (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
  3232  @deftypefunx void mpz_add_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2})
  3233  Set @var{rop} to @math{@var{op1} + @var{op2}}.
  3234  @end deftypefun
  3235  
  3236  @deftypefun void mpz_sub (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
  3237  @deftypefunx void mpz_sub_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2})
  3238  @deftypefunx void mpz_ui_sub (mpz_t @var{rop}, unsigned long int @var{op1}, const mpz_t @var{op2})
  3239  Set @var{rop} to @var{op1} @minus{} @var{op2}.
  3240  @end deftypefun
  3241  
  3242  @deftypefun void mpz_mul (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
  3243  @deftypefunx void mpz_mul_si (mpz_t @var{rop}, const mpz_t @var{op1}, long int @var{op2})
  3244  @deftypefunx void mpz_mul_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2})
  3245  Set @var{rop} to @math{@var{op1} @GMPtimes{} @var{op2}}.
  3246  @end deftypefun
  3247  
  3248  @deftypefun void mpz_addmul (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
  3249  @deftypefunx void mpz_addmul_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2})
  3250  Set @var{rop} to @math{@var{rop} + @var{op1} @GMPtimes{} @var{op2}}.
  3251  @end deftypefun
  3252  
  3253  @deftypefun void mpz_submul (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
  3254  @deftypefunx void mpz_submul_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2})
  3255  Set @var{rop} to @math{@var{rop} - @var{op1} @GMPtimes{} @var{op2}}.
  3256  @end deftypefun
  3257  
  3258  @deftypefun void mpz_mul_2exp (mpz_t @var{rop}, const mpz_t @var{op1}, mp_bitcnt_t @var{op2})
  3259  @cindex Bit shift left
  3260  Set @var{rop} to @m{@var{op1} \times 2^{op2}, @var{op1} times 2 raised to
  3261  @var{op2}}.  This operation can also be defined as a left shift by @var{op2}
  3262  bits.
  3263  @end deftypefun
  3264  
  3265  @deftypefun void mpz_neg (mpz_t @var{rop}, const mpz_t @var{op})
  3266  Set @var{rop} to @minus{}@var{op}.
  3267  @end deftypefun
  3268  
  3269  @deftypefun void mpz_abs (mpz_t @var{rop}, const mpz_t @var{op})
  3270  Set @var{rop} to the absolute value of @var{op}.
  3271  @end deftypefun
  3272  
  3273  
  3274  @need 2000
  3275  @node Integer Division, Integer Exponentiation, Integer Arithmetic, Integer Functions
  3276  @section Division Functions
  3277  @cindex Integer division functions
  3278  @cindex Division functions
  3279  
  3280  Division is undefined if the divisor is zero.  Passing a zero divisor to the
  3281  division or modulo functions (including the modular powering functions
  3282  @code{mpz_powm} and @code{mpz_powm_ui}), will cause an intentional division by
  3283  zero.  This lets a program handle arithmetic exceptions in these functions the
  3284  same way as for normal C @code{int} arithmetic.
  3285  
  3286  @c  Separate deftypefun groups for cdiv, fdiv and tdiv produce a blank line
  3287  @c  between each, and seem to let tex do a better job of page breaks than an
  3288  @c  @sp 1 in the middle of one big set.
  3289  
  3290  @deftypefun void mpz_cdiv_q (mpz_t @var{q}, const mpz_t @var{n}, const mpz_t @var{d})
  3291  @deftypefunx void mpz_cdiv_r (mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
  3292  @deftypefunx void mpz_cdiv_qr (mpz_t @var{q}, mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
  3293  @maybepagebreak
  3294  @deftypefunx {unsigned long int} mpz_cdiv_q_ui (mpz_t @var{q}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
  3295  @deftypefunx {unsigned long int} mpz_cdiv_r_ui (mpz_t @var{r}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
  3296  @deftypefunx {unsigned long int} mpz_cdiv_qr_ui (mpz_t @var{q}, mpz_t @var{r}, @w{const mpz_t @var{n}}, @w{unsigned long int @var{d}})
  3297  @deftypefunx {unsigned long int} mpz_cdiv_ui (const mpz_t @var{n}, @w{unsigned long int @var{d}})
  3298  @maybepagebreak
  3299  @deftypefunx void mpz_cdiv_q_2exp (mpz_t @var{q}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}})
  3300  @deftypefunx void mpz_cdiv_r_2exp (mpz_t @var{r}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}})
  3301  @end deftypefun
  3302  
  3303  @deftypefun void mpz_fdiv_q (mpz_t @var{q}, const mpz_t @var{n}, const mpz_t @var{d})
  3304  @deftypefunx void mpz_fdiv_r (mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
  3305  @deftypefunx void mpz_fdiv_qr (mpz_t @var{q}, mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
  3306  @maybepagebreak
  3307  @deftypefunx {unsigned long int} mpz_fdiv_q_ui (mpz_t @var{q}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
  3308  @deftypefunx {unsigned long int} mpz_fdiv_r_ui (mpz_t @var{r}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
  3309  @deftypefunx {unsigned long int} mpz_fdiv_qr_ui (mpz_t @var{q}, mpz_t @var{r}, @w{const mpz_t @var{n}}, @w{unsigned long int @var{d}})
  3310  @deftypefunx {unsigned long int} mpz_fdiv_ui (const mpz_t @var{n}, @w{unsigned long int @var{d}})
  3311  @maybepagebreak
  3312  @deftypefunx void mpz_fdiv_q_2exp (mpz_t @var{q}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}})
  3313  @deftypefunx void mpz_fdiv_r_2exp (mpz_t @var{r}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}})
  3314  @end deftypefun
  3315  
  3316  @deftypefun void mpz_tdiv_q (mpz_t @var{q}, const mpz_t @var{n}, const mpz_t @var{d})
  3317  @deftypefunx void mpz_tdiv_r (mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
  3318  @deftypefunx void mpz_tdiv_qr (mpz_t @var{q}, mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
  3319  @maybepagebreak
  3320  @deftypefunx {unsigned long int} mpz_tdiv_q_ui (mpz_t @var{q}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
  3321  @deftypefunx {unsigned long int} mpz_tdiv_r_ui (mpz_t @var{r}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
  3322  @deftypefunx {unsigned long int} mpz_tdiv_qr_ui (mpz_t @var{q}, mpz_t @var{r}, @w{const mpz_t @var{n}}, @w{unsigned long int @var{d}})
  3323  @deftypefunx {unsigned long int} mpz_tdiv_ui (const mpz_t @var{n}, @w{unsigned long int @var{d}})
  3324  @maybepagebreak
  3325  @deftypefunx void mpz_tdiv_q_2exp (mpz_t @var{q}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}})
  3326  @deftypefunx void mpz_tdiv_r_2exp (mpz_t @var{r}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}})
  3327  @cindex Bit shift right
  3328  
  3329  @sp 1
  3330  Divide @var{n} by @var{d}, forming a quotient @var{q} and/or remainder
  3331  @var{r}.  For the @code{2exp} functions, @m{@var{d}=2^b, @var{d}=2^@var{b}}.
  3332  The rounding is in three styles, each suiting different applications.
  3333  
  3334  @itemize @bullet
  3335  @item
  3336  @code{cdiv} rounds @var{q} up towards @m{+\infty, +infinity}, and @var{r} will
  3337  have the opposite sign to @var{d}.  The @code{c} stands for ``ceil''.
  3338  
  3339  @item
  3340  @code{fdiv} rounds @var{q} down towards @m{-\infty, @minus{}infinity}, and
  3341  @var{r} will have the same sign as @var{d}.  The @code{f} stands for
  3342  ``floor''.
  3343  
  3344  @item
  3345  @code{tdiv} rounds @var{q} towards zero, and @var{r} will have the same sign
  3346  as @var{n}.  The @code{t} stands for ``truncate''.
  3347  @end itemize
  3348  
  3349  In all cases @var{q} and @var{r} will satisfy
  3350  @m{@var{n}=@var{q}@var{d}+@var{r}, @var{n}=@var{q}*@var{d}+@var{r}}, and
  3351  @var{r} will satisfy @math{0@le{}@GMPabs{@var{r}}<@GMPabs{@var{d}}}.
  3352  
  3353  The @code{q} functions calculate only the quotient, the @code{r} functions
  3354  only the remainder, and the @code{qr} functions calculate both.  Note that for
  3355  @code{qr} the same variable cannot be passed for both @var{q} and @var{r}, or
  3356  results will be unpredictable.
  3357  
  3358  For the @code{ui} variants the return value is the remainder, and in fact
  3359  returning the remainder is all the @code{div_ui} functions do.  For
  3360  @code{tdiv} and @code{cdiv} the remainder can be negative, so for those the
  3361  return value is the absolute value of the remainder.
  3362  
  3363  For the @code{2exp} variants the divisor is @m{2^b,2^@var{b}}.  These
  3364  functions are implemented as right shifts and bit masks, but of course they
  3365  round the same as the other functions.
  3366  
  3367  For positive @var{n} both @code{mpz_fdiv_q_2exp} and @code{mpz_tdiv_q_2exp}
  3368  are simple bitwise right shifts.  For negative @var{n}, @code{mpz_fdiv_q_2exp}
  3369  is effectively an arithmetic right shift treating @var{n} as twos complement
  3370  the same as the bitwise logical functions do, whereas @code{mpz_tdiv_q_2exp}
  3371  effectively treats @var{n} as sign and magnitude.
  3372  @end deftypefun
  3373  
  3374  @deftypefun void mpz_mod (mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
  3375  @deftypefunx {unsigned long int} mpz_mod_ui (mpz_t @var{r}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
  3376  Set @var{r} to @var{n} @code{mod} @var{d}.  The sign of the divisor is
  3377  ignored; the result is always non-negative.
  3378  
  3379  @code{mpz_mod_ui} is identical to @code{mpz_fdiv_r_ui} above, returning the
  3380  remainder as well as setting @var{r}.  See @code{mpz_fdiv_ui} above if only
  3381  the return value is wanted.
  3382  @end deftypefun
  3383  
  3384  @deftypefun void mpz_divexact (mpz_t @var{q}, const mpz_t @var{n}, const mpz_t @var{d})
  3385  @deftypefunx void mpz_divexact_ui (mpz_t @var{q}, const mpz_t @var{n}, unsigned long @var{d})
  3386  @cindex Exact division functions
  3387  Set @var{q} to @var{n}/@var{d}.  These functions produce correct results only
  3388  when it is known in advance that @var{d} divides @var{n}.
  3389  
  3390  These routines are much faster than the other division functions, and are the
  3391  best choice when exact division is known to occur, for example reducing a
  3392  rational to lowest terms.
  3393  @end deftypefun
  3394  
  3395  @deftypefun int mpz_divisible_p (const mpz_t @var{n}, const mpz_t @var{d})
  3396  @deftypefunx int mpz_divisible_ui_p (const mpz_t @var{n}, unsigned long int @var{d})
  3397  @deftypefunx int mpz_divisible_2exp_p (const mpz_t @var{n}, mp_bitcnt_t @var{b})
  3398  @cindex Divisibility functions
  3399  Return non-zero if @var{n} is exactly divisible by @var{d}, or in the case of
  3400  @code{mpz_divisible_2exp_p} by @m{2^b,2^@var{b}}.
  3401  
  3402  @var{n} is divisible by @var{d} if there exists an integer @var{q} satisfying
  3403  @math{@var{n} = @var{q}@GMPmultiply{}@var{d}}.  Unlike the other division
  3404  functions, @math{@var{d}=0} is accepted and following the rule it can be seen
  3405  that only 0 is considered divisible by 0.
  3406  @end deftypefun
  3407  
  3408  @deftypefun int mpz_congruent_p (const mpz_t @var{n}, const mpz_t @var{c}, const mpz_t @var{d})
  3409  @deftypefunx int mpz_congruent_ui_p (const mpz_t @var{n}, unsigned long int @var{c}, unsigned long int @var{d})
  3410  @deftypefunx int mpz_congruent_2exp_p (const mpz_t @var{n}, const mpz_t @var{c}, mp_bitcnt_t @var{b})
  3411  @cindex Divisibility functions
  3412  @cindex Congruence functions
  3413  Return non-zero if @var{n} is congruent to @var{c} modulo @var{d}, or in the
  3414  case of @code{mpz_congruent_2exp_p} modulo @m{2^b,2^@var{b}}.
  3415  
  3416  @var{n} is congruent to @var{c} mod @var{d} if there exists an integer @var{q}
  3417  satisfying @math{@var{n} = @var{c} + @var{q}@GMPmultiply{}@var{d}}.  Unlike
  3418  the other division functions, @math{@var{d}=0} is accepted and following the
  3419  rule it can be seen that @var{n} and @var{c} are considered congruent mod 0
  3420  only when exactly equal.
  3421  @end deftypefun
  3422  
  3423  
  3424  @need 2000
  3425  @node Integer Exponentiation, Integer Roots, Integer Division, Integer Functions
  3426  @section Exponentiation Functions
  3427  @cindex Integer exponentiation functions
  3428  @cindex Exponentiation functions
  3429  @cindex Powering functions
  3430  
  3431  @deftypefun void mpz_powm (mpz_t @var{rop}, const mpz_t @var{base}, const mpz_t @var{exp}, const mpz_t @var{mod})
  3432  @deftypefunx void mpz_powm_ui (mpz_t @var{rop}, const mpz_t @var{base}, unsigned long int @var{exp}, const mpz_t @var{mod})
  3433  Set @var{rop} to @m{base^{exp} \bmod mod, (@var{base} raised to @var{exp})
  3434  modulo @var{mod}}.
  3435  
  3436  Negative @var{exp} is supported if an inverse @math{@var{base}^@W{-1} @bmod
  3437  @var{mod}} exists (see @code{mpz_invert} in @ref{Number Theoretic Functions}).
  3438  If an inverse doesn't exist then a divide by zero is raised.
  3439  @end deftypefun
  3440  
  3441  @deftypefun void mpz_powm_sec (mpz_t @var{rop}, const mpz_t @var{base}, const mpz_t @var{exp}, const mpz_t @var{mod})
  3442  Set @var{rop} to @m{base^{exp} \bmod @var{mod}, (@var{base} raised to @var{exp})
  3443  modulo @var{mod}}.
  3444  
  3445  It is required that @math{@var{exp} > 0} and that @var{mod} is odd.
  3446  
  3447  This function is designed to take the same time and have the same cache access
  3448  patterns for any two same-size arguments, assuming that function arguments are
  3449  placed at the same position and that the machine state is identical upon
  3450  function entry.  This function is intended for cryptographic purposes, where
  3451  resilience to side-channel attacks is desired.
  3452  @end deftypefun
  3453  
  3454  @deftypefun void mpz_pow_ui (mpz_t @var{rop}, const mpz_t @var{base}, unsigned long int @var{exp})
  3455  @deftypefunx void mpz_ui_pow_ui (mpz_t @var{rop}, unsigned long int @var{base}, unsigned long int @var{exp})
  3456  Set @var{rop} to @m{base^{exp}, @var{base} raised to @var{exp}}.  The case
  3457  @math{0^0} yields 1.
  3458  @end deftypefun
  3459  
  3460  
  3461  @need 2000
  3462  @node Integer Roots, Number Theoretic Functions, Integer Exponentiation, Integer Functions
  3463  @section Root Extraction Functions
  3464  @cindex Integer root functions
  3465  @cindex Root extraction functions
  3466  
  3467  @deftypefun int mpz_root (mpz_t @var{rop}, const mpz_t @var{op}, unsigned long int @var{n})
  3468  Set @var{rop} to @m{\lfloor\root n \of {op}\rfloor@C{},} the truncated integer
  3469  part of the @var{n}th root of @var{op}.  Return non-zero if the computation
  3470  was exact, i.e., if @var{op} is @var{rop} to the @var{n}th power.
  3471  @end deftypefun
  3472  
  3473  @deftypefun void mpz_rootrem (mpz_t @var{root}, mpz_t @var{rem}, const mpz_t @var{u}, unsigned long int @var{n})
  3474  Set @var{root} to @m{\lfloor\root n \of {u}\rfloor@C{},} the truncated
  3475  integer part of the @var{n}th root of @var{u}.  Set @var{rem} to the
  3476  remainder, @m{(@var{u} - @var{root}^n),
  3477  @var{u}@minus{}@var{root}**@var{n}}.
  3478  @end deftypefun
  3479  
  3480  @deftypefun void mpz_sqrt (mpz_t @var{rop}, const mpz_t @var{op})
  3481  Set @var{rop} to @m{\lfloor\sqrt{@var{op}}\rfloor@C{},} the truncated
  3482  integer part of the square root of @var{op}.
  3483  @end deftypefun
  3484  
  3485  @deftypefun void mpz_sqrtrem (mpz_t @var{rop1}, mpz_t @var{rop2}, const mpz_t @var{op})
  3486  Set @var{rop1} to @m{\lfloor\sqrt{@var{op}}\rfloor, the truncated integer part
  3487  of the square root of @var{op}}, like @code{mpz_sqrt}.  Set @var{rop2} to the
  3488  remainder @m{(@var{op} - @var{rop1}^2),
  3489  @var{op}@minus{}@var{rop1}*@var{rop1}}, which will be zero if @var{op} is a
  3490  perfect square.
  3491  
  3492  If @var{rop1} and @var{rop2} are the same variable, the results are
  3493  undefined.
  3494  @end deftypefun
  3495  
  3496  @deftypefun int mpz_perfect_power_p (const mpz_t @var{op})
  3497  @cindex Perfect power functions
  3498  @cindex Root testing functions
  3499  Return non-zero if @var{op} is a perfect power, i.e., if there exist integers
  3500  @m{a,@var{a}} and @m{b,@var{b}}, with @m{b>1, @var{b}>1}, such that
  3501  @m{@var{op}=a^b, @var{op} equals @var{a} raised to the power @var{b}}.
  3502  
  3503  Under this definition both 0 and 1 are considered to be perfect powers.
  3504  Negative values of @var{op} are accepted, but of course can only be odd
  3505  perfect powers.
  3506  @end deftypefun
  3507  
  3508  @deftypefun int mpz_perfect_square_p (const mpz_t @var{op})
  3509  @cindex Perfect square functions
  3510  @cindex Root testing functions
  3511  Return non-zero if @var{op} is a perfect square, i.e., if the square root of
  3512  @var{op} is an integer.  Under this definition both 0 and 1 are considered to
  3513  be perfect squares.
  3514  @end deftypefun
  3515  
  3516  
  3517  @need 2000
  3518  @node Number Theoretic Functions, Integer Comparisons, Integer Roots, Integer Functions
  3519  @section Number Theoretic Functions
  3520  @cindex Number theoretic functions
  3521  
  3522  @deftypefun int mpz_probab_prime_p (const mpz_t @var{n}, int @var{reps})
  3523  @cindex Prime testing functions
  3524  @cindex Probable prime testing functions
  3525  Determine whether @var{n} is prime.  Return 2 if @var{n} is definitely prime,
  3526  return 1 if @var{n} is probably prime (without being certain), or return 0 if
  3527  @var{n} is definitely non-prime.
  3528  
  3529  This function performs some trial divisions, then @var{reps} Miller-Rabin
  3530  probabilistic primality tests.  A higher @var{reps} value will reduce the
  3531  chances of a non-prime being identified as ``probably prime''.  A composite
  3532  number will be identified as a prime with a probability of less than
  3533  @m{4^{-reps},4^(-@var{reps})}.  Reasonable values of @var{reps} are between 15
  3534  and 50.
  3535  @end deftypefun
  3536  
  3537  @deftypefun void mpz_nextprime (mpz_t @var{rop}, const mpz_t @var{op})
  3538  @cindex Next prime function
  3539  Set @var{rop} to the next prime greater than @var{op}.
  3540  
  3541  This function uses a probabilistic algorithm to identify primes.  For
  3542  practical purposes it's adequate, the chance of a composite passing will be
  3543  extremely small.
  3544  @end deftypefun
  3545  
  3546  @c mpz_prime_p not implemented as of gmp 3.0.
  3547  
  3548  @c @deftypefun int mpz_prime_p (const mpz_t @var{n})
  3549  @c Return non-zero if @var{n} is prime and zero if @var{n} is a non-prime.
  3550  @c This function is far slower than @code{mpz_probab_prime_p}, but then it
  3551  @c never returns non-zero for composite numbers.
  3552  
  3553  @c (For practical purposes, using @code{mpz_probab_prime_p} is adequate.
  3554  @c The likelihood of a programming error or hardware malfunction is orders
  3555  @c of magnitudes greater than the likelihood for a composite to pass as a
  3556  @c prime, if the @var{reps} argument is in the suggested range.)
  3557  @c @end deftypefun
  3558  
  3559  @deftypefun void mpz_gcd (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
  3560  @cindex Greatest common divisor functions
  3561  @cindex GCD functions
  3562  Set @var{rop} to the greatest common divisor of @var{op1} and @var{op2}.  The
  3563  result is always positive even if one or both input operands are negative.
  3564  Except if both inputs are zero; then this function defines @math{gcd(0,0) = 0}.
  3565  @end deftypefun
  3566  
  3567  @deftypefun {unsigned long int} mpz_gcd_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2})
  3568  Compute the greatest common divisor of @var{op1} and @var{op2}.  If
  3569  @var{rop} is not @code{NULL}, store the result there.
  3570  
  3571  If the result is small enough to fit in an @code{unsigned long int}, it is
  3572  returned.  If the result does not fit, 0 is returned, and the result is equal
  3573  to the argument @var{op1}.  Note that the result will always fit if @var{op2}
  3574  is non-zero.
  3575  @end deftypefun
  3576  
  3577  @deftypefun void mpz_gcdext (mpz_t @var{g}, mpz_t @var{s}, mpz_t @var{t}, const mpz_t @var{a}, const mpz_t @var{b})
  3578  @cindex Extended GCD
  3579  @cindex GCD extended
  3580  Set @var{g} to the greatest common divisor of @var{a} and @var{b}, and in
  3581  addition set @var{s} and @var{t} to coefficients satisfying
  3582  @math{@var{a}@GMPmultiply{}@var{s} + @var{b}@GMPmultiply{}@var{t} = @var{g}}.
  3583  The value in @var{g} is always positive, even if one or both of @var{a} and
  3584  @var{b} are negative (or zero if both inputs are zero).  The values in @var{s}
  3585  and @var{t} are chosen such that normally, @math{@GMPabs{@var{s}} <
  3586  @GMPabs{@var{b}} / (2 @var{g})} and @math{@GMPabs{@var{t}} < @GMPabs{@var{a}}
  3587  / (2 @var{g})}, and these relations define @var{s} and @var{t} uniquely.  There
  3588  are a few exceptional cases:
  3589  
  3590  If @math{@GMPabs{@var{a}} = @GMPabs{@var{b}}}, then @math{@var{s} = 0},
  3591  @math{@var{t} = sgn(@var{b})}.
  3592  
  3593  Otherwise, @math{@var{s} = sgn(@var{a})} if @math{@var{b} = 0} or
  3594  @math{@GMPabs{@var{b}} = 2 @var{g}}, and @math{@var{t} = sgn(@var{b})} if
  3595  @math{@var{a} = 0} or @math{@GMPabs{@var{a}} = 2 @var{g}}.
  3596  
  3597  In all cases, @math{@var{s} = 0} if and only if @math{@var{g} =
  3598  @GMPabs{@var{b}}}, i.e., if @var{b} divides @var{a} or @math{@var{a} = @var{b}
  3599  = 0}.
  3600  
  3601  If @var{t} is @code{NULL} then that value is not computed.
  3602  @end deftypefun
  3603  
  3604  @deftypefun void mpz_lcm (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
  3605  @deftypefunx void mpz_lcm_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long @var{op2})
  3606  @cindex Least common multiple functions
  3607  @cindex LCM functions
  3608  Set @var{rop} to the least common multiple of @var{op1} and @var{op2}.
  3609  @var{rop} is always positive, irrespective of the signs of @var{op1} and
  3610  @var{op2}.  @var{rop} will be zero if either @var{op1} or @var{op2} is zero.
  3611  @end deftypefun
  3612  
  3613  @deftypefun int mpz_invert (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
  3614  @cindex Modular inverse functions
  3615  @cindex Inverse modulo functions
  3616  Compute the inverse of @var{op1} modulo @var{op2} and put the result in
  3617  @var{rop}.  If the inverse exists, the return value is non-zero and @var{rop}
  3618  will satisfy @math{0 @le{} @var{rop} < @GMPabs{@var{op2}}} (with @math{@var{rop}
  3619  = 0} possible only when @math{@GMPabs{@var{op2}} = 1}, i.e., in the
  3620  somewhat degenerate zero ring).  If an inverse doesn't
  3621  exist the return value is zero and @var{rop} is undefined.  The behaviour of
  3622  this function is undefined when @var{op2} is zero.
  3623  @end deftypefun
  3624  
  3625  @deftypefun int mpz_jacobi (const mpz_t @var{a}, const mpz_t @var{b})
  3626  @cindex Jacobi symbol functions
  3627  Calculate the Jacobi symbol @m{\left(a \over b\right),
  3628  (@var{a}/@var{b})}.  This is defined only for @var{b} odd.
  3629  @end deftypefun
  3630  
  3631  @deftypefun int mpz_legendre (const mpz_t @var{a}, const mpz_t @var{p})
  3632  @cindex Legendre symbol functions
  3633  Calculate the Legendre symbol @m{\left(a \over p\right),
  3634  (@var{a}/@var{p})}.  This is defined only for @var{p} an odd positive
  3635  prime, and for such @var{p} it's identical to the Jacobi symbol.
  3636  @end deftypefun
  3637  
  3638  @deftypefun int mpz_kronecker (const mpz_t @var{a}, const mpz_t @var{b})
  3639  @deftypefunx int mpz_kronecker_si (const mpz_t @var{a}, long @var{b})
  3640  @deftypefunx int mpz_kronecker_ui (const mpz_t @var{a}, unsigned long @var{b})
  3641  @deftypefunx int mpz_si_kronecker (long @var{a}, const mpz_t @var{b})
  3642  @deftypefunx int mpz_ui_kronecker (unsigned long @var{a}, const mpz_t @var{b})
  3643  @cindex Kronecker symbol functions
  3644  Calculate the Jacobi symbol @m{\left(a \over b\right),
  3645  (@var{a}/@var{b})} with the Kronecker extension @m{\left(a \over
  3646  2\right) = \left(2 \over a\right), (a/2)=(2/a)} when @math{a} odd, or
  3647  @m{\left(a \over 2\right) = 0, (a/2)=0} when @math{a} even.
  3648  
  3649  When @var{b} is odd the Jacobi symbol and Kronecker symbol are
  3650  identical, so @code{mpz_kronecker_ui} etc can be used for mixed
  3651  precision Jacobi symbols too.
  3652  
  3653  For more information see Henri Cohen section 1.4.2 (@pxref{References}),
  3654  or any number theory textbook.  See also the example program
  3655  @file{demos/qcn.c} which uses @code{mpz_kronecker_ui}.
  3656  @end deftypefun
  3657  
  3658  @deftypefun {mp_bitcnt_t} mpz_remove (mpz_t @var{rop}, const mpz_t @var{op}, const mpz_t @var{f})
  3659  @cindex Remove factor functions
  3660  @cindex Factor removal functions
  3661  Remove all occurrences of the factor @var{f} from @var{op} and store the
  3662  result in @var{rop}.  The return value is how many such occurrences were
  3663  removed.
  3664  @end deftypefun
  3665  
  3666  @deftypefun void mpz_fac_ui (mpz_t @var{rop}, unsigned long int @var{n})
  3667  @deftypefunx void mpz_2fac_ui (mpz_t @var{rop}, unsigned long int @var{n})
  3668  @deftypefunx void mpz_mfac_uiui (mpz_t @var{rop}, unsigned long int @var{n}, unsigned long int @var{m})
  3669  @cindex Factorial functions
  3670  Set @var{rop} to the factorial of @var{n}: @code{mpz_fac_ui} computes the plain factorial @var{n}!,
  3671  @code{mpz_2fac_ui} computes the double-factorial @var{n}!!, and @code{mpz_mfac_uiui} the
  3672  @var{m}-multi-factorial @m{n!^{(m)}, @var{n}!^(@var{m})}.
  3673  @end deftypefun
  3674  
  3675  @deftypefun void mpz_primorial_ui (mpz_t @var{rop}, unsigned long int @var{n})
  3676  @cindex Primorial functions
  3677  Set @var{rop} to the primorial of @var{n}, i.e. the product of all positive
  3678  prime numbers @math{@le{}@var{n}}.
  3679  @end deftypefun
  3680  
  3681  @deftypefun void mpz_bin_ui (mpz_t @var{rop}, const mpz_t @var{n}, unsigned long int @var{k})
  3682  @deftypefunx void mpz_bin_uiui (mpz_t @var{rop}, unsigned long int @var{n}, @w{unsigned long int @var{k}})
  3683  @cindex Binomial coefficient functions
  3684  Compute the binomial coefficient @m{\left({n}\atop{k}\right), @var{n} over
  3685  @var{k}} and store the result in @var{rop}.  Negative values of @var{n} are
  3686  supported by @code{mpz_bin_ui}, using the identity
  3687  @m{\left({-n}\atop{k}\right) = (-1)^k \left({n+k-1}\atop{k}\right),
  3688  bin(-n@C{}k) = (-1)^k * bin(n+k-1@C{}k)}, see Knuth volume 1 section 1.2.6
  3689  part G.
  3690  @end deftypefun
  3691  
  3692  @deftypefun void mpz_fib_ui (mpz_t @var{fn}, unsigned long int @var{n})
  3693  @deftypefunx void mpz_fib2_ui (mpz_t @var{fn}, mpz_t @var{fnsub1}, unsigned long int @var{n})
  3694  @cindex Fibonacci sequence functions
  3695  @code{mpz_fib_ui} sets @var{fn} to to @m{F_n,F[n]}, the @var{n}'th Fibonacci
  3696  number.  @code{mpz_fib2_ui} sets @var{fn} to @m{F_n,F[n]}, and @var{fnsub1} to
  3697  @m{F_{n-1},F[n-1]}.
  3698  
  3699  These functions are designed for calculating isolated Fibonacci numbers.  When
  3700  a sequence of values is wanted it's best to start with @code{mpz_fib2_ui} and
  3701  iterate the defining @m{F_{n+1} = F_n + F_{n-1}, F[n+1]=F[n]+F[n-1]} or
  3702  similar.
  3703  @end deftypefun
  3704  
  3705  @deftypefun void mpz_lucnum_ui (mpz_t @var{ln}, unsigned long int @var{n})
  3706  @deftypefunx void mpz_lucnum2_ui (mpz_t @var{ln}, mpz_t @var{lnsub1}, unsigned long int @var{n})
  3707  @cindex Lucas number functions
  3708  @code{mpz_lucnum_ui} sets @var{ln} to to @m{L_n,L[n]}, the @var{n}'th Lucas
  3709  number.  @code{mpz_lucnum2_ui} sets @var{ln} to @m{L_n,L[n]}, and @var{lnsub1}
  3710  to @m{L_{n-1},L[n-1]}.
  3711  
  3712  These functions are designed for calculating isolated Lucas numbers.  When a
  3713  sequence of values is wanted it's best to start with @code{mpz_lucnum2_ui} and
  3714  iterate the defining @m{L_{n+1} = L_n + L_{n-1}, L[n+1]=L[n]+L[n-1]} or
  3715  similar.
  3716  
  3717  The Fibonacci numbers and Lucas numbers are related sequences, so it's never
  3718  necessary to call both @code{mpz_fib2_ui} and @code{mpz_lucnum2_ui}.  The
  3719  formulas for going from Fibonacci to Lucas can be found in @ref{Lucas Numbers
  3720  Algorithm}, the reverse is straightforward too.
  3721  @end deftypefun
  3722  
  3723  
  3724  @node Integer Comparisons, Integer Logic and Bit Fiddling, Number Theoretic Functions, Integer Functions
  3725  @comment  node-name,  next,  previous,  up
  3726  @section Comparison Functions
  3727  @cindex Integer comparison functions
  3728  @cindex Comparison functions
  3729  
  3730  @deftypefn Function int mpz_cmp (const mpz_t @var{op1}, const mpz_t @var{op2})
  3731  @deftypefnx Function int mpz_cmp_d (const mpz_t @var{op1}, double @var{op2})
  3732  @deftypefnx Macro int mpz_cmp_si (const mpz_t @var{op1}, signed long int @var{op2})
  3733  @deftypefnx Macro int mpz_cmp_ui (const mpz_t @var{op1}, unsigned long int @var{op2})
  3734  Compare @var{op1} and @var{op2}.  Return a positive value if @math{@var{op1} >
  3735  @var{op2}}, zero if @math{@var{op1} = @var{op2}}, or a negative value if
  3736  @math{@var{op1} < @var{op2}}.
  3737  
  3738  @code{mpz_cmp_ui} and @code{mpz_cmp_si} are macros and will evaluate their
  3739  arguments more than once.  @code{mpz_cmp_d} can be called with an infinity,
  3740  but results are undefined for a NaN.
  3741  @end deftypefn
  3742  
  3743  @deftypefn Function int mpz_cmpabs (const mpz_t @var{op1}, const mpz_t @var{op2})
  3744  @deftypefnx Function int mpz_cmpabs_d (const mpz_t @var{op1}, double @var{op2})
  3745  @deftypefnx Function int mpz_cmpabs_ui (const mpz_t @var{op1}, unsigned long int @var{op2})
  3746  Compare the absolute values of @var{op1} and @var{op2}.  Return a positive
  3747  value if @math{@GMPabs{@var{op1}} > @GMPabs{@var{op2}}}, zero if
  3748  @math{@GMPabs{@var{op1}} = @GMPabs{@var{op2}}}, or a negative value if
  3749  @math{@GMPabs{@var{op1}} < @GMPabs{@var{op2}}}.
  3750  
  3751  @code{mpz_cmpabs_d} can be called with an infinity, but results are undefined
  3752  for a NaN.
  3753  @end deftypefn
  3754  
  3755  @deftypefn Macro int mpz_sgn (const mpz_t @var{op})
  3756  @cindex Sign tests
  3757  @cindex Integer sign tests
  3758  Return @math{+1} if @math{@var{op} > 0}, 0 if @math{@var{op} = 0}, and
  3759  @math{-1} if @math{@var{op} < 0}.
  3760  
  3761  This function is actually implemented as a macro.  It evaluates its argument
  3762  multiple times.
  3763  @end deftypefn
  3764  
  3765  
  3766  @node Integer Logic and Bit Fiddling, I/O of Integers, Integer Comparisons, Integer Functions
  3767  @comment  node-name,  next,  previous,  up
  3768  @section Logical and Bit Manipulation Functions
  3769  @cindex Logical functions
  3770  @cindex Bit manipulation functions
  3771  @cindex Integer logical functions
  3772  @cindex Integer bit manipulation functions
  3773  
  3774  These functions behave as if twos complement arithmetic were used (although
  3775  sign-magnitude is the actual implementation).  The least significant bit is
  3776  number 0.
  3777  
  3778  @deftypefun void mpz_and (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
  3779  Set @var{rop} to @var{op1} bitwise-and @var{op2}.
  3780  @end deftypefun
  3781  
  3782  @deftypefun void mpz_ior (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
  3783  Set @var{rop} to @var{op1} bitwise inclusive-or @var{op2}.
  3784  @end deftypefun
  3785  
  3786  @deftypefun void mpz_xor (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
  3787  Set @var{rop} to @var{op1} bitwise exclusive-or @var{op2}.
  3788  @end deftypefun
  3789  
  3790  @deftypefun void mpz_com (mpz_t @var{rop}, const mpz_t @var{op})
  3791  Set @var{rop} to the one's complement of @var{op}.
  3792  @end deftypefun
  3793  
  3794  @deftypefun {mp_bitcnt_t} mpz_popcount (const mpz_t @var{op})
  3795  If @math{@var{op}@ge{}0}, return the population count of @var{op}, which is the
  3796  number of 1 bits in the binary representation.  If @math{@var{op}<0}, the
  3797  number of 1s is infinite, and the return value is the largest possible
  3798  @code{mp_bitcnt_t}.
  3799  @end deftypefun
  3800  
  3801  @deftypefun {mp_bitcnt_t} mpz_hamdist (const mpz_t @var{op1}, const mpz_t @var{op2})
  3802  If @var{op1} and @var{op2} are both @math{@ge{}0} or both @math{<0}, return the
  3803  hamming distance between the two operands, which is the number of bit positions
  3804  where @var{op1} and @var{op2} have different bit values.  If one operand is
  3805  @math{@ge{}0} and the other @math{<0} then the number of bits different is
  3806  infinite, and the return value is the largest possible @code{mp_bitcnt_t}.
  3807  @end deftypefun
  3808  
  3809  @deftypefun {mp_bitcnt_t} mpz_scan0 (const mpz_t @var{op}, mp_bitcnt_t @var{starting_bit})
  3810  @deftypefunx {mp_bitcnt_t} mpz_scan1 (const mpz_t @var{op}, mp_bitcnt_t @var{starting_bit})
  3811  @cindex Bit scanning functions
  3812  @cindex Scan bit functions
  3813  Scan @var{op}, starting from bit @var{starting_bit}, towards more significant
  3814  bits, until the first 0 or 1 bit (respectively) is found.  Return the index of
  3815  the found bit.
  3816  
  3817  If the bit at @var{starting_bit} is already what's sought, then
  3818  @var{starting_bit} is returned.
  3819  
  3820  If there's no bit found, then the largest possible @code{mp_bitcnt_t} is
  3821  returned.  This will happen in @code{mpz_scan0} past the end of a negative
  3822  number, or @code{mpz_scan1} past the end of a nonnegative number.
  3823  @end deftypefun
  3824  
  3825  @deftypefun void mpz_setbit (mpz_t @var{rop}, mp_bitcnt_t @var{bit_index})
  3826  Set bit @var{bit_index} in @var{rop}.
  3827  @end deftypefun
  3828  
  3829  @deftypefun void mpz_clrbit (mpz_t @var{rop}, mp_bitcnt_t @var{bit_index})
  3830  Clear bit @var{bit_index} in @var{rop}.
  3831  @end deftypefun
  3832  
  3833  @deftypefun void mpz_combit (mpz_t @var{rop}, mp_bitcnt_t @var{bit_index})
  3834  Complement bit @var{bit_index} in @var{rop}.
  3835  @end deftypefun
  3836  
  3837  @deftypefun int mpz_tstbit (const mpz_t @var{op}, mp_bitcnt_t @var{bit_index})
  3838  Test bit @var{bit_index} in @var{op} and return 0 or 1 accordingly.
  3839  @end deftypefun
  3840  
  3841  @node I/O of Integers, Integer Random Numbers, Integer Logic and Bit Fiddling, Integer Functions
  3842  @comment  node-name,  next,  previous,  up
  3843  @section Input and Output Functions
  3844  @cindex Integer input and output functions
  3845  @cindex Input functions
  3846  @cindex Output functions
  3847  @cindex I/O functions
  3848  
  3849  Functions that perform input from a stdio stream, and functions that output to
  3850  a stdio stream, of @code{mpz} numbers.  Passing a @code{NULL} pointer for a
  3851  @var{stream} argument to any of these functions will make them read from
  3852  @code{stdin} and write to @code{stdout}, respectively.
  3853  
  3854  When using any of these functions, it is a good idea to include @file{stdio.h}
  3855  before @file{gmp.h}, since that will allow @file{gmp.h} to define prototypes
  3856  for these functions.
  3857  
  3858  See also @ref{Formatted Output} and @ref{Formatted Input}.
  3859  
  3860  @deftypefun size_t mpz_out_str (FILE *@var{stream}, int @var{base}, const mpz_t @var{op})
  3861  Output @var{op} on stdio stream @var{stream}, as a string of digits in base
  3862  @var{base}.  The base argument may vary from 2 to 62 or from @minus{}2 to
  3863  @minus{}36.
  3864  
  3865  For @var{base} in the range 2..36, digits and lower-case letters are used; for
  3866  @minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62,
  3867  digits, upper-case letters, and lower-case letters (in that significance order)
  3868  are used.
  3869  
  3870  Return the number of bytes written, or if an error occurred, return 0.
  3871  @end deftypefun
  3872  
  3873  @deftypefun size_t mpz_inp_str (mpz_t @var{rop}, FILE *@var{stream}, int @var{base})
  3874  Input a possibly white-space preceded string in base @var{base} from stdio
  3875  stream @var{stream}, and put the read integer in @var{rop}.
  3876  
  3877  The @var{base} may vary from 2 to 62, or if @var{base} is 0, then the leading
  3878  characters are used: @code{0x} and @code{0X} for hexadecimal, @code{0b} and
  3879  @code{0B} for binary, @code{0} for octal, or decimal otherwise.
  3880  
  3881  For bases up to 36, case is ignored; upper-case and lower-case letters have
  3882  the same value.  For bases 37 to 62, upper-case letter represent the usual
  3883  10..35 while lower-case letter represent 36..61.
  3884  
  3885  Return the number of bytes read, or if an error occurred, return 0.
  3886  @end deftypefun
  3887  
  3888  @deftypefun size_t mpz_out_raw (FILE *@var{stream}, const mpz_t @var{op})
  3889  Output @var{op} on stdio stream @var{stream}, in raw binary format.  The
  3890  integer is written in a portable format, with 4 bytes of size information, and
  3891  that many bytes of limbs.  Both the size and the limbs are written in
  3892  decreasing significance order (i.e., in big-endian).
  3893  
  3894  The output can be read with @code{mpz_inp_raw}.
  3895  
  3896  Return the number of bytes written, or if an error occurred, return 0.
  3897  
  3898  The output of this can not be read by @code{mpz_inp_raw} from GMP 1, because
  3899  of changes necessary for compatibility between 32-bit and 64-bit machines.
  3900  @end deftypefun
  3901  
  3902  @deftypefun size_t mpz_inp_raw (mpz_t @var{rop}, FILE *@var{stream})
  3903  Input from stdio stream @var{stream} in the format written by
  3904  @code{mpz_out_raw}, and put the result in @var{rop}.  Return the number of
  3905  bytes read, or if an error occurred, return 0.
  3906  
  3907  This routine can read the output from @code{mpz_out_raw} also from GMP 1, in
  3908  spite of changes necessary for compatibility between 32-bit and 64-bit
  3909  machines.
  3910  @end deftypefun
  3911  
  3912  
  3913  @need 2000
  3914  @node Integer Random Numbers, Integer Import and Export, I/O of Integers, Integer Functions
  3915  @comment  node-name,  next,  previous,  up
  3916  @section Random Number Functions
  3917  @cindex Integer random number functions
  3918  @cindex Random number functions
  3919  
  3920  The random number functions of GMP come in two groups; older function
  3921  that rely on a global state, and newer functions that accept a state
  3922  parameter that is read and modified.  Please see the @ref{Random Number
  3923  Functions} for more information on how to use and not to use random
  3924  number functions.
  3925  
  3926  @deftypefun void mpz_urandomb (mpz_t @var{rop}, gmp_randstate_t @var{state}, mp_bitcnt_t @var{n})
  3927  Generate a uniformly distributed random integer in the range 0 to @m{2^n-1,
  3928  2^@var{n}@minus{}1}, inclusive.
  3929  
  3930  The variable @var{state} must be initialized by calling one of the
  3931  @code{gmp_randinit} functions (@ref{Random State Initialization}) before
  3932  invoking this function.
  3933  @end deftypefun
  3934  
  3935  @deftypefun void mpz_urandomm (mpz_t @var{rop}, gmp_randstate_t @var{state}, const mpz_t @var{n})
  3936  Generate a uniform random integer in the range 0 to @math{@var{n}-1},
  3937  inclusive.
  3938  
  3939  The variable @var{state} must be initialized by calling one of the
  3940  @code{gmp_randinit} functions (@ref{Random State Initialization})
  3941  before invoking this function.
  3942  @end deftypefun
  3943  
  3944  @deftypefun void mpz_rrandomb (mpz_t @var{rop}, gmp_randstate_t @var{state}, mp_bitcnt_t @var{n})
  3945  Generate a random integer with long strings of zeros and ones in the
  3946  binary representation.  Useful for testing functions and algorithms,
  3947  since this kind of random numbers have proven to be more likely to
  3948  trigger corner-case bugs.  The random number will be in the range
  3949  @m{2^{n-1}, 2^@var{n@minus{}1}} to @m{2^n-1, 2^@var{n}@minus{}1}, inclusive.
  3950  
  3951  The variable @var{state} must be initialized by calling one of the
  3952  @code{gmp_randinit} functions (@ref{Random State Initialization})
  3953  before invoking this function.
  3954  @end deftypefun
  3955  
  3956  @deftypefun void mpz_random (mpz_t @var{rop}, mp_size_t @var{max_size})
  3957  Generate a random integer of at most @var{max_size} limbs.  The generated
  3958  random number doesn't satisfy any particular requirements of randomness.
  3959  Negative random numbers are generated when @var{max_size} is negative.
  3960  
  3961  This function is obsolete.  Use @code{mpz_urandomb} or
  3962  @code{mpz_urandomm} instead.
  3963  @end deftypefun
  3964  
  3965  @deftypefun void mpz_random2 (mpz_t @var{rop}, mp_size_t @var{max_size})
  3966  Generate a random integer of at most @var{max_size} limbs, with long strings
  3967  of zeros and ones in the binary representation.  Useful for testing functions
  3968  and algorithms, since this kind of random numbers have proven to be more
  3969  likely to trigger corner-case bugs.  Negative random numbers are generated
  3970  when @var{max_size} is negative.
  3971  
  3972  This function is obsolete.  Use @code{mpz_rrandomb} instead.
  3973  @end deftypefun
  3974  
  3975  
  3976  @node Integer Import and Export, Miscellaneous Integer Functions, Integer Random Numbers, Integer Functions
  3977  @section Integer Import and Export
  3978  
  3979  @code{mpz_t} variables can be converted to and from arbitrary words of binary
  3980  data with the following functions.
  3981  
  3982  @deftypefun void mpz_import (mpz_t @var{rop}, size_t @var{count}, int @var{order}, size_t @var{size}, int @var{endian}, size_t @var{nails}, const void *@var{op})
  3983  @cindex Integer import
  3984  @cindex Import
  3985  Set @var{rop} from an array of word data at @var{op}.
  3986  
  3987  The parameters specify the format of the data.  @var{count} many words are
  3988  read, each @var{size} bytes.  @var{order} can be 1 for most significant word
  3989  first or -1 for least significant first.  Within each word @var{endian} can be
  3990  1 for most significant byte first, -1 for least significant first, or 0 for
  3991  the native endianness of the host CPU@.  The most significant @var{nails} bits
  3992  of each word are skipped, this can be 0 to use the full words.
  3993  
  3994  There is no sign taken from the data, @var{rop} will simply be a positive
  3995  integer.  An application can handle any sign itself, and apply it for instance
  3996  with @code{mpz_neg}.
  3997  
  3998  There are no data alignment restrictions on @var{op}, any address is allowed.
  3999  
  4000  Here's an example converting an array of @code{unsigned long} data, most
  4001  significant element first, and host byte order within each value.
  4002  
  4003  @example
  4004  unsigned long  a[20];
  4005  /* Initialize @var{z} and @var{a} */
  4006  mpz_import (z, 20, 1, sizeof(a[0]), 0, 0, a);
  4007  @end example
  4008  
  4009  This example assumes the full @code{sizeof} bytes are used for data in the
  4010  given type, which is usually true, and certainly true for @code{unsigned long}
  4011  everywhere we know of.  However on Cray vector systems it may be noted that
  4012  @code{short} and @code{int} are always stored in 8 bytes (and with
  4013  @code{sizeof} indicating that) but use only 32 or 46 bits.  The @var{nails}
  4014  feature can account for this, by passing for instance
  4015  @code{8*sizeof(int)-INT_BIT}.
  4016  @end deftypefun
  4017  
  4018  @deftypefun {void *} mpz_export (void *@var{rop}, size_t *@var{countp}, int @var{order}, size_t @var{size}, int @var{endian}, size_t @var{nails}, const mpz_t @var{op})
  4019  @cindex Integer export
  4020  @cindex Export
  4021  Fill @var{rop} with word data from @var{op}.
  4022  
  4023  The parameters specify the format of the data produced.  Each word will be
  4024  @var{size} bytes and @var{order} can be 1 for most significant word first or
  4025  -1 for least significant first.  Within each word @var{endian} can be 1 for
  4026  most significant byte first, -1 for least significant first, or 0 for the
  4027  native endianness of the host CPU@.  The most significant @var{nails} bits of
  4028  each word are unused and set to zero, this can be 0 to produce full words.
  4029  
  4030  The number of words produced is written to @code{*@var{countp}}, or
  4031  @var{countp} can be @code{NULL} to discard the count.  @var{rop} must have
  4032  enough space for the data, or if @var{rop} is @code{NULL} then a result array
  4033  of the necessary size is allocated using the current GMP allocation function
  4034  (@pxref{Custom Allocation}).  In either case the return value is the
  4035  destination used, either @var{rop} or the allocated block.
  4036  
  4037  If @var{op} is non-zero then the most significant word produced will be
  4038  non-zero.  If @var{op} is zero then the count returned will be zero and
  4039  nothing written to @var{rop}.  If @var{rop} is @code{NULL} in this case, no
  4040  block is allocated, just @code{NULL} is returned.
  4041  
  4042  The sign of @var{op} is ignored, just the absolute value is exported.  An
  4043  application can use @code{mpz_sgn} to get the sign and handle it as desired.
  4044  (@pxref{Integer Comparisons})
  4045  
  4046  There are no data alignment restrictions on @var{rop}, any address is allowed.
  4047  
  4048  When an application is allocating space itself the required size can be
  4049  determined with a calculation like the following.  Since @code{mpz_sizeinbase}
  4050  always returns at least 1, @code{count} here will be at least one, which
  4051  avoids any portability problems with @code{malloc(0)}, though if @code{z} is
  4052  zero no space at all is actually needed (or written).
  4053  
  4054  @example
  4055  numb = 8*size - nail;
  4056  count = (mpz_sizeinbase (z, 2) + numb-1) / numb;
  4057  p = malloc (count * size);
  4058  @end example
  4059  @end deftypefun
  4060  
  4061  
  4062  @need 2000
  4063  @node Miscellaneous Integer Functions, Integer Special Functions, Integer Import and Export, Integer Functions
  4064  @comment  node-name,  next,  previous,  up
  4065  @section Miscellaneous Functions
  4066  @cindex Miscellaneous integer functions
  4067  @cindex Integer miscellaneous functions
  4068  
  4069  @deftypefun int mpz_fits_ulong_p (const mpz_t @var{op})
  4070  @deftypefunx int mpz_fits_slong_p (const mpz_t @var{op})
  4071  @deftypefunx int mpz_fits_uint_p (const mpz_t @var{op})
  4072  @deftypefunx int mpz_fits_sint_p (const mpz_t @var{op})
  4073  @deftypefunx int mpz_fits_ushort_p (const mpz_t @var{op})
  4074  @deftypefunx int mpz_fits_sshort_p (const mpz_t @var{op})
  4075  Return non-zero iff the value of @var{op} fits in an @code{unsigned long int},
  4076  @code{signed long int}, @code{unsigned int}, @code{signed int}, @code{unsigned
  4077  short int}, or @code{signed short int}, respectively.  Otherwise, return zero.
  4078  @end deftypefun
  4079  
  4080  @deftypefn Macro int mpz_odd_p (const mpz_t @var{op})
  4081  @deftypefnx Macro int mpz_even_p (const mpz_t @var{op})
  4082  Determine whether @var{op} is odd or even, respectively.  Return non-zero if
  4083  yes, zero if no.  These macros evaluate their argument more than once.
  4084  @end deftypefn
  4085  
  4086  @deftypefun size_t mpz_sizeinbase (const mpz_t @var{op}, int @var{base})
  4087  @cindex Size in digits
  4088  @cindex Digits in an integer
  4089  Return the size of @var{op} measured in number of digits in the given
  4090  @var{base}.  @var{base} can vary from 2 to 62.  The sign of @var{op} is
  4091  ignored, just the absolute value is used.  The result will be either exact or
  4092  1 too big.  If @var{base} is a power of 2, the result is always exact.  If
  4093  @var{op} is zero the return value is always 1.
  4094  
  4095  This function can be used to determine the space required when converting
  4096  @var{op} to a string.  The right amount of allocation is normally two more
  4097  than the value returned by @code{mpz_sizeinbase}, one extra for a minus sign
  4098  and one for the null-terminator.
  4099  
  4100  @cindex Most significant bit
  4101  It will be noted that @code{mpz_sizeinbase(@var{op},2)} can be used to locate
  4102  the most significant 1 bit in @var{op}, counting from 1.  (Unlike the bitwise
  4103  functions which start from 0, @xref{Integer Logic and Bit Fiddling,, Logical
  4104  and Bit Manipulation Functions}.)
  4105  @end deftypefun
  4106  
  4107  
  4108  @node Integer Special Functions,  , Miscellaneous Integer Functions, Integer Functions
  4109  @section Special Functions
  4110  @cindex Special integer functions
  4111  @cindex Integer special functions
  4112  
  4113  The functions in this section are for various special purposes.  Most
  4114  applications will not need them.
  4115  
  4116  @deftypefun void mpz_array_init (mpz_t @var{integer_array}, mp_size_t @var{array_size}, @w{mp_size_t @var{fixed_num_bits}})
  4117  @strong{This is an obsolete function.  Do not use it.}
  4118  @end deftypefun
  4119  
  4120  @deftypefun {void *} _mpz_realloc (mpz_t @var{integer}, mp_size_t @var{new_alloc})
  4121  Change the space for @var{integer} to @var{new_alloc} limbs.  The value in
  4122  @var{integer} is preserved if it fits, or is set to 0 if not.  The return
  4123  value is not useful to applications and should be ignored.
  4124  
  4125  @code{mpz_realloc2} is the preferred way to accomplish allocation changes like
  4126  this.  @code{mpz_realloc2} and @code{_mpz_realloc} are the same except that
  4127  @code{_mpz_realloc} takes its size in limbs.
  4128  @end deftypefun
  4129  
  4130  @deftypefun mp_limb_t mpz_getlimbn (const mpz_t @var{op}, mp_size_t @var{n})
  4131  Return limb number @var{n} from @var{op}.  The sign of @var{op} is ignored,
  4132  just the absolute value is used.  The least significant limb is number 0.
  4133  
  4134  @code{mpz_size} can be used to find how many limbs make up @var{op}.
  4135  @code{mpz_getlimbn} returns zero if @var{n} is outside the range 0 to
  4136  @code{mpz_size(@var{op})-1}.
  4137  @end deftypefun
  4138  
  4139  @deftypefun size_t mpz_size (const mpz_t @var{op})
  4140  Return the size of @var{op} measured in number of limbs.  If @var{op} is zero,
  4141  the returned value will be zero.
  4142  @c (@xref{Nomenclature}, for an explanation of the concept @dfn{limb}.)
  4143  @end deftypefun
  4144  
  4145  @deftypefun {const mp_limb_t *} mpz_limbs_read (const mpz_t @var{x})
  4146  Return a pointer to the limb array representing the absolute value of @var{x}.
  4147  The size of the array is @code{mpz_size(@var{x})}. Intended for read access
  4148  only.
  4149  @end deftypefun
  4150  
  4151  @deftypefun {mp_limb_t *} mpz_limbs_write (mpz_t @var{x}, mp_size_t @var{n})
  4152  @deftypefunx {mp_limb_t *} mpz_limbs_modify (mpz_t @var{x}, mp_size_t @var{n})
  4153  Return a pointer to the limb array, intended for write access. The array is
  4154  reallocated as needed, to make room for @var{n} limbs. Requires @math{@var{n}
  4155  > 0}. The @code{mpz_limbs_modify} function returns an array that holds the old
  4156  absolute value of @var{x}, while @code{mpz_limbs_write} may destroy the old
  4157  value and return an array with unspecified contents.
  4158  @end deftypefun
  4159  
  4160  @deftypefun void mpz_limbs_finish (mpz_t @var{x}, mp_size_t @var{s})
  4161  Updates the internal size field of @var{x}. Used after writing to the limb
  4162  array pointer returned by @code{mpz_limbs_write} or @code{mpz_limbs_modify} is
  4163  completed. The array should contain @math{@GMPabs{@var{s}}} valid limbs,
  4164  representing the new absolute value for @var{x}, and the sign of @var{x} is
  4165  taken from the sign of @var{s}. This function never reallocates @var{x}, so
  4166  the limb pointer remains valid.
  4167  @end deftypefun
  4168  
  4169  @c FIXME: Some more useful and less silly example?
  4170  @example
  4171  void foo (mpz_t x)
  4172  @{
  4173    mp_size_t n, i;
  4174    mp_limb_t *xp;
  4175  
  4176    n = mpz_size (x);
  4177    xp = mpz_limbs_modify (x, 2*n);
  4178    for (i = 0; i < n; i++)
  4179      xp[n+i] = xp[n-1-i];
  4180    mpz_limbs_finish (x, mpz_sgn (x) < 0 ? - 2*n : 2*n);
  4181  @}
  4182  @end example
  4183  
  4184  @deftypefun mpz_srcptr mpz_roinit_n (mpz_t @var{x}, const mp_limb_t *@var{xp}, mp_size_t @var{xs})
  4185  Special initialization of @var{x}, using the given limb array and size.
  4186  @var{x} should be treated as read-only: it can be passed safely as input to
  4187  any mpz function, but not as an output. The array @var{xp} must point to at
  4188  least a readable limb, its size is
  4189  @math{@GMPabs{@var{xs}}}, and the sign of @var{x} is the sign of @var{xs}. For
  4190  convenience, the function returns @var{x}, but cast to a const pointer type.
  4191  @end deftypefun
  4192  
  4193  @example
  4194  void foo (mpz_t x)
  4195  @{
  4196    static const mp_limb_t y[3] = @{ 0x1, 0x2, 0x3 @};
  4197    mpz_t tmp;
  4198    mpz_add (x, x, mpz_roinit_n (tmp, y, 3));
  4199  @}
  4200  @end example
  4201  
  4202  @deftypefn Macro mpz_t MPZ_ROINIT_N (mp_limb_t *@var{xp}, mp_size_t @var{xs})
  4203  This macro expands to an initializer which can be assigned to an mpz_t
  4204  variable. The limb array @var{xp} must point to at least a readable limb,
  4205  moreover, unlike the @code{mpz_roinit_n} function, the array must be
  4206  normalized: if @var{xs} is non-zero, then
  4207  @code{@var{xp}[@math{@GMPabs{@var{xs}}-1}]} must be non-zero. Intended
  4208  primarily for constant values. Using it for non-constant values requires a C
  4209  compiler supporting C99.
  4210  @end deftypefn
  4211  
  4212  @example
  4213  void foo (mpz_t x)
  4214  @{
  4215    static const mp_limb_t ya[3] = @{ 0x1, 0x2, 0x3 @};
  4216    static const mpz_t y = MPZ_ROINIT_N ((mp_limb_t *) ya, 3);
  4217  
  4218    mpz_add (x, x, y);
  4219  @}
  4220  @end example
  4221  
  4222  
  4223  @node Rational Number Functions, Floating-point Functions, Integer Functions, Top
  4224  @comment  node-name,  next,  previous,  up
  4225  @chapter Rational Number Functions
  4226  @cindex Rational number functions
  4227  
  4228  This chapter describes the GMP functions for performing arithmetic on rational
  4229  numbers.  These functions start with the prefix @code{mpq_}.
  4230  
  4231  Rational numbers are stored in objects of type @code{mpq_t}.
  4232  
  4233  All rational arithmetic functions assume operands have a canonical form, and
  4234  canonicalize their result.  The canonical form means that the denominator and
  4235  the numerator have no common factors, and that the denominator is positive.
  4236  Zero has the unique representation 0/1.
  4237  
  4238  Pure assignment functions do not canonicalize the assigned variable.  It is
  4239  the responsibility of the user to canonicalize the assigned variable before
  4240  any arithmetic operations are performed on that variable.
  4241  
  4242  @deftypefun void mpq_canonicalize (mpq_t @var{op})
  4243  Remove any factors that are common to the numerator and denominator of
  4244  @var{op}, and make the denominator positive.
  4245  @end deftypefun
  4246  
  4247  @menu
  4248  * Initializing Rationals::
  4249  * Rational Conversions::
  4250  * Rational Arithmetic::
  4251  * Comparing Rationals::
  4252  * Applying Integer Functions::
  4253  * I/O of Rationals::
  4254  @end menu
  4255  
  4256  @node Initializing Rationals, Rational Conversions, Rational Number Functions, Rational Number Functions
  4257  @comment  node-name,  next,  previous,  up
  4258  @section Initialization and Assignment Functions
  4259  @cindex Rational assignment functions
  4260  @cindex Assignment functions
  4261  @cindex Rational initialization functions
  4262  @cindex Initialization functions
  4263  
  4264  @deftypefun void mpq_init (mpq_t @var{x})
  4265  Initialize @var{x} and set it to 0/1.  Each variable should normally only be
  4266  initialized once, or at least cleared out (using the function @code{mpq_clear})
  4267  between each initialization.
  4268  @end deftypefun
  4269  
  4270  @deftypefun void mpq_inits (mpq_t @var{x}, ...)
  4271  Initialize a NULL-terminated list of @code{mpq_t} variables, and set their
  4272  values to 0/1.
  4273  @end deftypefun
  4274  
  4275  @deftypefun void mpq_clear (mpq_t @var{x})
  4276  Free the space occupied by @var{x}.  Make sure to call this function for all
  4277  @code{mpq_t} variables when you are done with them.
  4278  @end deftypefun
  4279  
  4280  @deftypefun void mpq_clears (mpq_t @var{x}, ...)
  4281  Free the space occupied by a NULL-terminated list of @code{mpq_t} variables.
  4282  @end deftypefun
  4283  
  4284  @deftypefun void mpq_set (mpq_t @var{rop}, const mpq_t @var{op})
  4285  @deftypefunx void mpq_set_z (mpq_t @var{rop}, const mpz_t @var{op})
  4286  Assign @var{rop} from @var{op}.
  4287  @end deftypefun
  4288  
  4289  @deftypefun void mpq_set_ui (mpq_t @var{rop}, unsigned long int @var{op1}, unsigned long int @var{op2})
  4290  @deftypefunx void mpq_set_si (mpq_t @var{rop}, signed long int @var{op1}, unsigned long int @var{op2})
  4291  Set the value of @var{rop} to @var{op1}/@var{op2}.  Note that if @var{op1} and
  4292  @var{op2} have common factors, @var{rop} has to be passed to
  4293  @code{mpq_canonicalize} before any operations are performed on @var{rop}.
  4294  @end deftypefun
  4295  
  4296  @deftypefun int mpq_set_str (mpq_t @var{rop}, const char *@var{str}, int @var{base})
  4297  Set @var{rop} from a null-terminated string @var{str} in the given @var{base}.
  4298  
  4299  The string can be an integer like ``41'' or a fraction like ``41/152''.  The
  4300  fraction must be in canonical form (@pxref{Rational Number Functions}), or if
  4301  not then @code{mpq_canonicalize} must be called.
  4302  
  4303  The numerator and optional denominator are parsed the same as in
  4304  @code{mpz_set_str} (@pxref{Assigning Integers}).  White space is allowed in
  4305  the string, and is simply ignored.  The @var{base} can vary from 2 to 62, or
  4306  if @var{base} is 0 then the leading characters are used: @code{0x} or @code{0X} for hex,
  4307  @code{0b} or @code{0B} for binary,
  4308  @code{0} for octal, or decimal otherwise.  Note that this is done separately
  4309  for the numerator and denominator, so for instance @code{0xEF/100} is 239/100,
  4310  whereas @code{0xEF/0x100} is 239/256.
  4311  
  4312  The return value is 0 if the entire string is a valid number, or @minus{}1 if
  4313  not.
  4314  @end deftypefun
  4315  
  4316  @deftypefun void mpq_swap (mpq_t @var{rop1}, mpq_t @var{rop2})
  4317  Swap the values @var{rop1} and @var{rop2} efficiently.
  4318  @end deftypefun
  4319  
  4320  
  4321  @need 2000
  4322  @node Rational Conversions, Rational Arithmetic, Initializing Rationals, Rational Number Functions
  4323  @comment  node-name,  next,  previous,  up
  4324  @section Conversion Functions
  4325  @cindex Rational conversion functions
  4326  @cindex Conversion functions
  4327  
  4328  @deftypefun double mpq_get_d (const mpq_t @var{op})
  4329  Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding
  4330  towards zero).
  4331  
  4332  If the exponent from the conversion is too big or too small to fit a
  4333  @code{double} then the result is system dependent.  For too big an infinity is
  4334  returned when available.  For too small @math{0.0} is normally returned.
  4335  Hardware overflow, underflow and denorm traps may or may not occur.
  4336  @end deftypefun
  4337  
  4338  @deftypefun void mpq_set_d (mpq_t @var{rop}, double @var{op})
  4339  @deftypefunx void mpq_set_f (mpq_t @var{rop}, const mpf_t @var{op})
  4340  Set @var{rop} to the value of @var{op}.  There is no rounding, this conversion
  4341  is exact.
  4342  @end deftypefun
  4343  
  4344  @deftypefun {char *} mpq_get_str (char *@var{str}, int @var{base}, const mpq_t @var{op})
  4345  Convert @var{op} to a string of digits in base @var{base}.  The base may vary
  4346  from 2 to 36.  The string will be of the form @samp{num/den}, or if the
  4347  denominator is 1 then just @samp{num}.
  4348  
  4349  If @var{str} is @code{NULL}, the result string is allocated using the current
  4350  allocation function (@pxref{Custom Allocation}).  The block will be
  4351  @code{strlen(str)+1} bytes, that being exactly enough for the string and
  4352  null-terminator.
  4353  
  4354  If @var{str} is not @code{NULL}, it should point to a block of storage large
  4355  enough for the result, that being
  4356  
  4357  @example
  4358  mpz_sizeinbase (mpq_numref(@var{op}), @var{base})
  4359  + mpz_sizeinbase (mpq_denref(@var{op}), @var{base}) + 3
  4360  @end example
  4361  
  4362  The three extra bytes are for a possible minus sign, possible slash, and the
  4363  null-terminator.
  4364  
  4365  A pointer to the result string is returned, being either the allocated block,
  4366  or the given @var{str}.
  4367  @end deftypefun
  4368  
  4369  
  4370  @node Rational Arithmetic, Comparing Rationals, Rational Conversions, Rational Number Functions
  4371  @comment  node-name,  next,  previous,  up
  4372  @section Arithmetic Functions
  4373  @cindex Rational arithmetic functions
  4374  @cindex Arithmetic functions
  4375  
  4376  @deftypefun void mpq_add (mpq_t @var{sum}, const mpq_t @var{addend1}, const mpq_t @var{addend2})
  4377  Set @var{sum} to @var{addend1} + @var{addend2}.
  4378  @end deftypefun
  4379  
  4380  @deftypefun void mpq_sub (mpq_t @var{difference}, const mpq_t @var{minuend}, const mpq_t @var{subtrahend})
  4381  Set @var{difference} to @var{minuend} @minus{} @var{subtrahend}.
  4382  @end deftypefun
  4383  
  4384  @deftypefun void mpq_mul (mpq_t @var{product}, const mpq_t @var{multiplier}, const mpq_t @var{multiplicand})
  4385  Set @var{product} to @math{@var{multiplier} @GMPtimes{} @var{multiplicand}}.
  4386  @end deftypefun
  4387  
  4388  @deftypefun void mpq_mul_2exp (mpq_t @var{rop}, const mpq_t @var{op1}, mp_bitcnt_t @var{op2})
  4389  Set @var{rop} to @m{@var{op1} \times 2^{op2}, @var{op1} times 2 raised to
  4390  @var{op2}}.
  4391  @end deftypefun
  4392  
  4393  @deftypefun void mpq_div (mpq_t @var{quotient}, const mpq_t @var{dividend}, const mpq_t @var{divisor})
  4394  @cindex Division functions
  4395  Set @var{quotient} to @var{dividend}/@var{divisor}.
  4396  @end deftypefun
  4397  
  4398  @deftypefun void mpq_div_2exp (mpq_t @var{rop}, const mpq_t @var{op1}, mp_bitcnt_t @var{op2})
  4399  Set @var{rop} to @m{@var{op1}/2^{op2}, @var{op1} divided by 2 raised to
  4400  @var{op2}}.
  4401  @end deftypefun
  4402  
  4403  @deftypefun void mpq_neg (mpq_t @var{negated_operand}, const mpq_t @var{operand})
  4404  Set @var{negated_operand} to @minus{}@var{operand}.
  4405  @end deftypefun
  4406  
  4407  @deftypefun void mpq_abs (mpq_t @var{rop}, const mpq_t @var{op})
  4408  Set @var{rop} to the absolute value of @var{op}.
  4409  @end deftypefun
  4410  
  4411  @deftypefun void mpq_inv (mpq_t @var{inverted_number}, const mpq_t @var{number})
  4412  Set @var{inverted_number} to 1/@var{number}.  If the new denominator is
  4413  zero, this routine will divide by zero.
  4414  @end deftypefun
  4415  
  4416  @node Comparing Rationals, Applying Integer Functions, Rational Arithmetic, Rational Number Functions
  4417  @comment  node-name,  next,  previous,  up
  4418  @section Comparison Functions
  4419  @cindex Rational comparison functions
  4420  @cindex Comparison functions
  4421  
  4422  @deftypefun int mpq_cmp (const mpq_t @var{op1}, const mpq_t @var{op2})
  4423  @deftypefunx int mpq_cmp_z (const mpq_t @var{op1}, const mpz_t @var{op2})
  4424  Compare @var{op1} and @var{op2}.  Return a positive value if @math{@var{op1} >
  4425  @var{op2}}, zero if @math{@var{op1} = @var{op2}}, and a negative value if
  4426  @math{@var{op1} < @var{op2}}.
  4427  
  4428  To determine if two rationals are equal, @code{mpq_equal} is faster than
  4429  @code{mpq_cmp}.
  4430  @end deftypefun
  4431  
  4432  @deftypefn Macro int mpq_cmp_ui (const mpq_t @var{op1}, unsigned long int @var{num2}, unsigned long int @var{den2})
  4433  @deftypefnx Macro int mpq_cmp_si (const mpq_t @var{op1}, long int @var{num2}, unsigned long int @var{den2})
  4434  Compare @var{op1} and @var{num2}/@var{den2}.  Return a positive value if
  4435  @math{@var{op1} > @var{num2}/@var{den2}}, zero if @math{@var{op1} =
  4436  @var{num2}/@var{den2}}, and a negative value if @math{@var{op1} <
  4437  @var{num2}/@var{den2}}.
  4438  
  4439  @var{num2} and @var{den2} are allowed to have common factors.
  4440  
  4441  These functions are implemented as a macros and evaluate their arguments
  4442  multiple times.
  4443  @end deftypefn
  4444  
  4445  @deftypefn Macro int mpq_sgn (const mpq_t @var{op})
  4446  @cindex Sign tests
  4447  @cindex Rational sign tests
  4448  Return @math{+1} if @math{@var{op} > 0}, 0 if @math{@var{op} = 0}, and
  4449  @math{-1} if @math{@var{op} < 0}.
  4450  
  4451  This function is actually implemented as a macro.  It evaluates its
  4452  argument multiple times.
  4453  @end deftypefn
  4454  
  4455  @deftypefun int mpq_equal (const mpq_t @var{op1}, const mpq_t @var{op2})
  4456  Return non-zero if @var{op1} and @var{op2} are equal, zero if they are
  4457  non-equal.  Although @code{mpq_cmp} can be used for the same purpose, this
  4458  function is much faster.
  4459  @end deftypefun
  4460  
  4461  @node Applying Integer Functions, I/O of Rationals, Comparing Rationals, Rational Number Functions
  4462  @comment  node-name,  next,  previous,  up
  4463  @section Applying Integer Functions to Rationals
  4464  @cindex Rational numerator and denominator
  4465  @cindex Numerator and denominator
  4466  
  4467  The set of @code{mpq} functions is quite small.  In particular, there are few
  4468  functions for either input or output.  The following functions give direct
  4469  access to the numerator and denominator of an @code{mpq_t}.
  4470  
  4471  Note that if an assignment to the numerator and/or denominator could take an
  4472  @code{mpq_t} out of the canonical form described at the start of this chapter
  4473  (@pxref{Rational Number Functions}) then @code{mpq_canonicalize} must be
  4474  called before any other @code{mpq} functions are applied to that @code{mpq_t}.
  4475  
  4476  @deftypefn Macro mpz_t mpq_numref (const mpq_t @var{op})
  4477  @deftypefnx Macro mpz_t mpq_denref (const mpq_t @var{op})
  4478  Return a reference to the numerator and denominator of @var{op}, respectively.
  4479  The @code{mpz} functions can be used on the result of these macros.
  4480  @end deftypefn
  4481  
  4482  @deftypefun void mpq_get_num (mpz_t @var{numerator}, const mpq_t @var{rational})
  4483  @deftypefunx void mpq_get_den (mpz_t @var{denominator}, const mpq_t @var{rational})
  4484  @deftypefunx void mpq_set_num (mpq_t @var{rational}, const mpz_t @var{numerator})
  4485  @deftypefunx void mpq_set_den (mpq_t @var{rational}, const mpz_t @var{denominator})
  4486  Get or set the numerator or denominator of a rational.  These functions are
  4487  equivalent to calling @code{mpz_set} with an appropriate @code{mpq_numref} or
  4488  @code{mpq_denref}.  Direct use of @code{mpq_numref} or @code{mpq_denref} is
  4489  recommended instead of these functions.
  4490  @end deftypefun
  4491  
  4492  
  4493  @need 2000
  4494  @node I/O of Rationals,  , Applying Integer Functions, Rational Number Functions
  4495  @comment  node-name,  next,  previous,  up
  4496  @section Input and Output Functions
  4497  @cindex Rational input and output functions
  4498  @cindex Input functions
  4499  @cindex Output functions
  4500  @cindex I/O functions
  4501  
  4502  Functions that perform input from a stdio stream, and functions that output to
  4503  a stdio stream, of @code{mpq} numbers.  Passing a @code{NULL} pointer for a
  4504  @var{stream} argument to any of these functions will make them read from
  4505  @code{stdin} and write to @code{stdout}, respectively.
  4506  
  4507  When using any of these functions, it is a good idea to include @file{stdio.h}
  4508  before @file{gmp.h}, since that will allow @file{gmp.h} to define prototypes
  4509  for these functions.
  4510  
  4511  See also @ref{Formatted Output} and @ref{Formatted Input}.
  4512  
  4513  @deftypefun size_t mpq_out_str (FILE *@var{stream}, int @var{base}, const mpq_t @var{op})
  4514  Output @var{op} on stdio stream @var{stream}, as a string of digits in base
  4515  @var{base}.  The base may vary from 2 to 36.  Output is in the form
  4516  @samp{num/den} or if the denominator is 1 then just @samp{num}.
  4517  
  4518  Return the number of bytes written, or if an error occurred, return 0.
  4519  @end deftypefun
  4520  
  4521  @deftypefun size_t mpq_inp_str (mpq_t @var{rop}, FILE *@var{stream}, int @var{base})
  4522  Read a string of digits from @var{stream} and convert them to a rational in
  4523  @var{rop}.  Any initial white-space characters are read and discarded.  Return
  4524  the number of characters read (including white space), or 0 if a rational
  4525  could not be read.
  4526  
  4527  The input can be a fraction like @samp{17/63} or just an integer like
  4528  @samp{123}.  Reading stops at the first character not in this form, and white
  4529  space is not permitted within the string.  If the input might not be in
  4530  canonical form, then @code{mpq_canonicalize} must be called (@pxref{Rational
  4531  Number Functions}).
  4532  
  4533  The @var{base} can be between 2 and 36, or can be 0 in which case the leading
  4534  characters of the string determine the base, @samp{0x} or @samp{0X} for
  4535  hexadecimal, @samp{0} for octal, or decimal otherwise.  The leading characters
  4536  are examined separately for the numerator and denominator of a fraction, so
  4537  for instance @samp{0x10/11} is @math{16/11}, whereas @samp{0x10/0x11} is
  4538  @math{16/17}.
  4539  @end deftypefun
  4540  
  4541  
  4542  @node Floating-point Functions, Low-level Functions, Rational Number Functions, Top
  4543  @comment  node-name,  next,  previous,  up
  4544  @chapter Floating-point Functions
  4545  @cindex Floating-point functions
  4546  @cindex Float functions
  4547  @cindex User-defined precision
  4548  @cindex Precision of floats
  4549  
  4550  GMP floating point numbers are stored in objects of type @code{mpf_t} and
  4551  functions operating on them have an @code{mpf_} prefix.
  4552  
  4553  The mantissa of each float has a user-selectable precision, in practice only
  4554  limited by available memory.  Each variable has its own precision, and that can
  4555  be increased or decreased at any time.  This selectable precision is a minimum
  4556  value, GMP rounds it up to a whole limb.
  4557  
  4558  The accuracy of a calculation is determined by the priorly set precision of the
  4559  destination variable and the numeric values of the input variables.  Input
  4560  variables' set precisions do not affect calculations (except indirectly as
  4561  their values might have been affected when they were assigned).
  4562  
  4563  The exponent of each float has fixed precision, one machine word on most
  4564  systems.  In the current implementation the exponent is a count of limbs, so
  4565  for example on a 32-bit system this means a range of roughly
  4566  @math{2^@W{-68719476768}} to @math{2^@W{68719476736}}, or on a 64-bit system
  4567  this will be much greater.  Note however that @code{mpf_get_str} can only
  4568  return an exponent which fits an @code{mp_exp_t} and currently
  4569  @code{mpf_set_str} doesn't accept exponents bigger than a @code{long}.
  4570  
  4571  Each variable keeps track of the mantissa data actually in use.  This means
  4572  that if a float is exactly represented in only a few bits then only those bits
  4573  will be used in a calculation, even if the variable's selected precision is
  4574  high.  This is a performance optimization; it does not affect the numeric
  4575  results.
  4576  
  4577  Internally, GMP sometimes calculates with higher precision than that of the
  4578  destination variable in order to limit errors.  Final results are always
  4579  truncated to the destination variable's precision.
  4580  
  4581  The mantissa is stored in binary.  One consequence of this is that decimal
  4582  fractions like @math{0.1} cannot be represented exactly.  The same is true of
  4583  plain IEEE @code{double} floats.  This makes both highly unsuitable for
  4584  calculations involving money or other values that should be exact decimal
  4585  fractions.  (Suitably scaled integers, or perhaps rationals, are better
  4586  choices.)
  4587  
  4588  The @code{mpf} functions and variables have no special notion of infinity or
  4589  not-a-number, and applications must take care not to overflow the exponent or
  4590  results will be unpredictable.
  4591  
  4592  Note that the @code{mpf} functions are @emph{not} intended as a smooth
  4593  extension to IEEE P754 arithmetic.  In particular results obtained on one
  4594  computer often differ from the results on a computer with a different word
  4595  size.
  4596  
  4597  New projects should consider using the GMP extension library MPFR
  4598  (@url{http://mpfr.org}) instead.  MPFR provides well-defined precision and
  4599  accurate rounding, and thereby naturally extends IEEE P754.
  4600  
  4601  @menu
  4602  * Initializing Floats::
  4603  * Assigning Floats::
  4604  * Simultaneous Float Init & Assign::
  4605  * Converting Floats::
  4606  * Float Arithmetic::
  4607  * Float Comparison::
  4608  * I/O of Floats::
  4609  * Miscellaneous Float Functions::
  4610  @end menu
  4611  
  4612  @node Initializing Floats, Assigning Floats, Floating-point Functions, Floating-point Functions
  4613  @comment  node-name,  next,  previous,  up
  4614  @section Initialization Functions
  4615  @cindex Float initialization functions
  4616  @cindex Initialization functions
  4617  
  4618  @deftypefun void mpf_set_default_prec (mp_bitcnt_t @var{prec})
  4619  Set the default precision to be @strong{at least} @var{prec} bits.  All
  4620  subsequent calls to @code{mpf_init} will use this precision, but previously
  4621  initialized variables are unaffected.
  4622  @end deftypefun
  4623  
  4624  @deftypefun {mp_bitcnt_t} mpf_get_default_prec (void)
  4625  Return the default precision actually used.
  4626  @end deftypefun
  4627  
  4628  An @code{mpf_t} object must be initialized before storing the first value in
  4629  it.  The functions @code{mpf_init} and @code{mpf_init2} are used for that
  4630  purpose.
  4631  
  4632  @deftypefun void mpf_init (mpf_t @var{x})
  4633  Initialize @var{x} to 0.  Normally, a variable should be initialized once only
  4634  or at least be cleared, using @code{mpf_clear}, between initializations.  The
  4635  precision of @var{x} is undefined unless a default precision has already been
  4636  established by a call to @code{mpf_set_default_prec}.
  4637  @end deftypefun
  4638  
  4639  @deftypefun void mpf_init2 (mpf_t @var{x}, mp_bitcnt_t @var{prec})
  4640  Initialize @var{x} to 0 and set its precision to be @strong{at least}
  4641  @var{prec} bits.  Normally, a variable should be initialized once only or at
  4642  least be cleared, using @code{mpf_clear}, between initializations.
  4643  @end deftypefun
  4644  
  4645  @deftypefun void mpf_inits (mpf_t @var{x}, ...)
  4646  Initialize a NULL-terminated list of @code{mpf_t} variables, and set their
  4647  values to 0.  The precision of the initialized variables is undefined unless a
  4648  default precision has already been established by a call to
  4649  @code{mpf_set_default_prec}.
  4650  @end deftypefun
  4651  
  4652  @deftypefun void mpf_clear (mpf_t @var{x})
  4653  Free the space occupied by @var{x}.  Make sure to call this function for all
  4654  @code{mpf_t} variables when you are done with them.
  4655  @end deftypefun
  4656  
  4657  @deftypefun void mpf_clears (mpf_t @var{x}, ...)
  4658  Free the space occupied by a NULL-terminated list of @code{mpf_t} variables.
  4659  @end deftypefun
  4660  
  4661  @need 2000
  4662  Here is an example on how to initialize floating-point variables:
  4663  @example
  4664  @{
  4665    mpf_t x, y;
  4666    mpf_init (x);           /* use default precision */
  4667    mpf_init2 (y, 256);     /* precision @emph{at least} 256 bits */
  4668    @dots{}
  4669    /* Unless the program is about to exit, do ... */
  4670    mpf_clear (x);
  4671    mpf_clear (y);
  4672  @}
  4673  @end example
  4674  
  4675  The following three functions are useful for changing the precision during a
  4676  calculation.  A typical use would be for adjusting the precision gradually in
  4677  iterative algorithms like Newton-Raphson, making the computation precision
  4678  closely match the actual accurate part of the numbers.
  4679  
  4680  @deftypefun {mp_bitcnt_t} mpf_get_prec (const mpf_t @var{op})
  4681  Return the current precision of @var{op}, in bits.
  4682  @end deftypefun
  4683  
  4684  @deftypefun void mpf_set_prec (mpf_t @var{rop}, mp_bitcnt_t @var{prec})
  4685  Set the precision of @var{rop} to be @strong{at least} @var{prec} bits.  The
  4686  value in @var{rop} will be truncated to the new precision.
  4687  
  4688  This function requires a call to @code{realloc}, and so should not be used in
  4689  a tight loop.
  4690  @end deftypefun
  4691  
  4692  @deftypefun void mpf_set_prec_raw (mpf_t @var{rop}, mp_bitcnt_t @var{prec})
  4693  Set the precision of @var{rop} to be @strong{at least} @var{prec} bits,
  4694  without changing the memory allocated.
  4695  
  4696  @var{prec} must be no more than the allocated precision for @var{rop}, that
  4697  being the precision when @var{rop} was initialized, or in the most recent
  4698  @code{mpf_set_prec}.
  4699  
  4700  The value in @var{rop} is unchanged, and in particular if it had a higher
  4701  precision than @var{prec} it will retain that higher precision.  New values
  4702  written to @var{rop} will use the new @var{prec}.
  4703  
  4704  Before calling @code{mpf_clear} or the full @code{mpf_set_prec}, another
  4705  @code{mpf_set_prec_raw} call must be made to restore @var{rop} to its original
  4706  allocated precision.  Failing to do so will have unpredictable results.
  4707  
  4708  @code{mpf_get_prec} can be used before @code{mpf_set_prec_raw} to get the
  4709  original allocated precision.  After @code{mpf_set_prec_raw} it reflects the
  4710  @var{prec} value set.
  4711  
  4712  @code{mpf_set_prec_raw} is an efficient way to use an @code{mpf_t} variable at
  4713  different precisions during a calculation, perhaps to gradually increase
  4714  precision in an iteration, or just to use various different precisions for
  4715  different purposes during a calculation.
  4716  @end deftypefun
  4717  
  4718  
  4719  @need 2000
  4720  @node Assigning Floats, Simultaneous Float Init & Assign, Initializing Floats, Floating-point Functions
  4721  @comment  node-name,  next,  previous,  up
  4722  @section Assignment Functions
  4723  @cindex Float assignment functions
  4724  @cindex Assignment functions
  4725  
  4726  These functions assign new values to already initialized floats
  4727  (@pxref{Initializing Floats}).
  4728  
  4729  @deftypefun void mpf_set (mpf_t @var{rop}, const mpf_t @var{op})
  4730  @deftypefunx void mpf_set_ui (mpf_t @var{rop}, unsigned long int @var{op})
  4731  @deftypefunx void mpf_set_si (mpf_t @var{rop}, signed long int @var{op})
  4732  @deftypefunx void mpf_set_d (mpf_t @var{rop}, double @var{op})
  4733  @deftypefunx void mpf_set_z (mpf_t @var{rop}, const mpz_t @var{op})
  4734  @deftypefunx void mpf_set_q (mpf_t @var{rop}, const mpq_t @var{op})
  4735  Set the value of @var{rop} from @var{op}.
  4736  @end deftypefun
  4737  
  4738  @deftypefun int mpf_set_str (mpf_t @var{rop}, const char *@var{str}, int @var{base})
  4739  Set the value of @var{rop} from the string in @var{str}.  The string is of the
  4740  form @samp{M@@N} or, if the base is 10 or less, alternatively @samp{MeN}.
  4741  @samp{M} is the mantissa and @samp{N} is the exponent.  The mantissa is always
  4742  in the specified base.  The exponent is either in the specified base or, if
  4743  @var{base} is negative, in decimal.  The decimal point expected is taken from
  4744  the current locale, on systems providing @code{localeconv}.
  4745  
  4746  The argument @var{base} may be in the ranges 2 to 62, or @minus{}62 to
  4747  @minus{}2.  Negative values are used to specify that the exponent is in
  4748  decimal.
  4749  
  4750  For bases up to 36, case is ignored; upper-case and lower-case letters have
  4751  the same value; for bases 37 to 62, upper-case letter represent the usual
  4752  10..35 while lower-case letter represent 36..61.
  4753  
  4754  Unlike the corresponding @code{mpz} function, the base will not be determined
  4755  from the leading characters of the string if @var{base} is 0.  This is so that
  4756  numbers like @samp{0.23} are not interpreted as octal.
  4757  
  4758  White space is allowed in the string, and is simply ignored.  [This is not
  4759  really true; white-space is ignored in the beginning of the string and within
  4760  the mantissa, but not in other places, such as after a minus sign or in the
  4761  exponent.  We are considering changing the definition of this function, making
  4762  it fail when there is any white-space in the input, since that makes a lot of
  4763  sense.  Please tell us your opinion about this change.  Do you really want it
  4764  to accept @nicode{"3 14"} as meaning 314 as it does now?]
  4765  
  4766  This function returns 0 if the entire string is a valid number in base
  4767  @var{base}.  Otherwise it returns @minus{}1.
  4768  @end deftypefun
  4769  
  4770  @deftypefun void mpf_swap (mpf_t @var{rop1}, mpf_t @var{rop2})
  4771  Swap @var{rop1} and @var{rop2} efficiently.  Both the values and the
  4772  precisions of the two variables are swapped.
  4773  @end deftypefun
  4774  
  4775  
  4776  @node Simultaneous Float Init & Assign, Converting Floats, Assigning Floats, Floating-point Functions
  4777  @comment  node-name,  next,  previous,  up
  4778  @section Combined Initialization and Assignment Functions
  4779  @cindex Float assignment functions
  4780  @cindex Assignment functions
  4781  @cindex Float initialization functions
  4782  @cindex Initialization functions
  4783  
  4784  For convenience, GMP provides a parallel series of initialize-and-set functions
  4785  which initialize the output and then store the value there.  These functions'
  4786  names have the form @code{mpf_init_set@dots{}}
  4787  
  4788  Once the float has been initialized by any of the @code{mpf_init_set@dots{}}
  4789  functions, it can be used as the source or destination operand for the ordinary
  4790  float functions.  Don't use an initialize-and-set function on a variable
  4791  already initialized!
  4792  
  4793  @deftypefun void mpf_init_set (mpf_t @var{rop}, const mpf_t @var{op})
  4794  @deftypefunx void mpf_init_set_ui (mpf_t @var{rop}, unsigned long int @var{op})
  4795  @deftypefunx void mpf_init_set_si (mpf_t @var{rop}, signed long int @var{op})
  4796  @deftypefunx void mpf_init_set_d (mpf_t @var{rop}, double @var{op})
  4797  Initialize @var{rop} and set its value from @var{op}.
  4798  
  4799  The precision of @var{rop} will be taken from the active default precision, as
  4800  set by @code{mpf_set_default_prec}.
  4801  @end deftypefun
  4802  
  4803  @deftypefun int mpf_init_set_str (mpf_t @var{rop}, const char *@var{str}, int @var{base})
  4804  Initialize @var{rop} and set its value from the string in @var{str}.  See
  4805  @code{mpf_set_str} above for details on the assignment operation.
  4806  
  4807  Note that @var{rop} is initialized even if an error occurs.  (I.e., you have to
  4808  call @code{mpf_clear} for it.)
  4809  
  4810  The precision of @var{rop} will be taken from the active default precision, as
  4811  set by @code{mpf_set_default_prec}.
  4812  @end deftypefun
  4813  
  4814  
  4815  @node Converting Floats, Float Arithmetic, Simultaneous Float Init & Assign, Floating-point Functions
  4816  @comment  node-name,  next,  previous,  up
  4817  @section Conversion Functions
  4818  @cindex Float conversion functions
  4819  @cindex Conversion functions
  4820  
  4821  @deftypefun double mpf_get_d (const mpf_t @var{op})
  4822  Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding
  4823  towards zero).
  4824  
  4825  If the exponent in @var{op} is too big or too small to fit a @code{double}
  4826  then the result is system dependent.  For too big an infinity is returned when
  4827  available.  For too small @math{0.0} is normally returned.  Hardware overflow,
  4828  underflow and denorm traps may or may not occur.
  4829  @end deftypefun
  4830  
  4831  @deftypefun double mpf_get_d_2exp (signed long int *@var{exp}, const mpf_t @var{op})
  4832  Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding
  4833  towards zero), and with an exponent returned separately.
  4834  
  4835  The return value is in the range @math{0.5@le{}@GMPabs{@var{d}}<1} and the
  4836  exponent is stored to @code{*@var{exp}}.  @m{@var{d} \times 2^{exp},
  4837  @var{d} * 2^@var{exp}} is the (truncated) @var{op} value.  If @var{op} is zero,
  4838  the return is @math{0.0} and 0 is stored to @code{*@var{exp}}.
  4839  
  4840  @cindex @code{frexp}
  4841  This is similar to the standard C @code{frexp} function (@pxref{Normalization
  4842  Functions,,, libc, The GNU C Library Reference Manual}).
  4843  @end deftypefun
  4844  
  4845  @deftypefun long mpf_get_si (const mpf_t @var{op})
  4846  @deftypefunx {unsigned long} mpf_get_ui (const mpf_t @var{op})
  4847  Convert @var{op} to a @code{long} or @code{unsigned long}, truncating any
  4848  fraction part.  If @var{op} is too big for the return type, the result is
  4849  undefined.
  4850  
  4851  See also @code{mpf_fits_slong_p} and @code{mpf_fits_ulong_p}
  4852  (@pxref{Miscellaneous Float Functions}).
  4853  @end deftypefun
  4854  
  4855  @deftypefun {char *} mpf_get_str (char *@var{str}, mp_exp_t *@var{expptr}, int @var{base}, size_t @var{n_digits}, const mpf_t @var{op})
  4856  Convert @var{op} to a string of digits in base @var{base}.  The base argument
  4857  may vary from 2 to 62 or from @minus{}2 to @minus{}36.  Up to @var{n_digits}
  4858  digits will be generated.  Trailing zeros are not returned.  No more digits
  4859  than can be accurately represented by @var{op} are ever generated.  If
  4860  @var{n_digits} is 0 then that accurate maximum number of digits are generated.
  4861  
  4862  For @var{base} in the range 2..36, digits and lower-case letters are used; for
  4863  @minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62,
  4864  digits, upper-case letters, and lower-case letters (in that significance order)
  4865  are used.
  4866  
  4867  If @var{str} is @code{NULL}, the result string is allocated using the current
  4868  allocation function (@pxref{Custom Allocation}).  The block will be
  4869  @code{strlen(str)+1} bytes, that being exactly enough for the string and
  4870  null-terminator.
  4871  
  4872  If @var{str} is not @code{NULL}, it should point to a block of
  4873  @math{@var{n_digits} + 2} bytes, that being enough for the mantissa, a
  4874  possible minus sign, and a null-terminator.  When @var{n_digits} is 0 to get
  4875  all significant digits, an application won't be able to know the space
  4876  required, and @var{str} should be @code{NULL} in that case.
  4877  
  4878  The generated string is a fraction, with an implicit radix point immediately
  4879  to the left of the first digit.  The applicable exponent is written through
  4880  the @var{expptr} pointer.  For example, the number 3.1416 would be returned as
  4881  string @nicode{"31416"} and exponent 1.
  4882  
  4883  When @var{op} is zero, an empty string is produced and the exponent returned
  4884  is 0.
  4885  
  4886  A pointer to the result string is returned, being either the allocated block
  4887  or the given @var{str}.
  4888  @end deftypefun
  4889  
  4890  
  4891  @node Float Arithmetic, Float Comparison, Converting Floats, Floating-point Functions
  4892  @comment  node-name,  next,  previous,  up
  4893  @section Arithmetic Functions
  4894  @cindex Float arithmetic functions
  4895  @cindex Arithmetic functions
  4896  
  4897  @deftypefun void mpf_add (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2})
  4898  @deftypefunx void mpf_add_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2})
  4899  Set @var{rop} to @math{@var{op1} + @var{op2}}.
  4900  @end deftypefun
  4901  
  4902  @deftypefun void mpf_sub (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2})
  4903  @deftypefunx void mpf_ui_sub (mpf_t @var{rop}, unsigned long int @var{op1}, const mpf_t @var{op2})
  4904  @deftypefunx void mpf_sub_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2})
  4905  Set @var{rop} to @var{op1} @minus{} @var{op2}.
  4906  @end deftypefun
  4907  
  4908  @deftypefun void mpf_mul (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2})
  4909  @deftypefunx void mpf_mul_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2})
  4910  Set @var{rop} to @math{@var{op1} @GMPtimes{} @var{op2}}.
  4911  @end deftypefun
  4912  
  4913  Division is undefined if the divisor is zero, and passing a zero divisor to the
  4914  divide functions will make these functions intentionally divide by zero.  This
  4915  lets the user handle arithmetic exceptions in these functions in the same
  4916  manner as other arithmetic exceptions.
  4917  
  4918  @deftypefun void mpf_div (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2})
  4919  @deftypefunx void mpf_ui_div (mpf_t @var{rop}, unsigned long int @var{op1}, const mpf_t @var{op2})
  4920  @deftypefunx void mpf_div_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2})
  4921  @cindex Division functions
  4922  Set @var{rop} to @var{op1}/@var{op2}.
  4923  @end deftypefun
  4924  
  4925  @deftypefun void mpf_sqrt (mpf_t @var{rop}, const mpf_t @var{op})
  4926  @deftypefunx void mpf_sqrt_ui (mpf_t @var{rop}, unsigned long int @var{op})
  4927  @cindex Root extraction functions
  4928  Set @var{rop} to @m{\sqrt{@var{op}}, the square root of @var{op}}.
  4929  @end deftypefun
  4930  
  4931  @deftypefun void mpf_pow_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2})
  4932  @cindex Exponentiation functions
  4933  @cindex Powering functions
  4934  Set @var{rop} to @m{@var{op1}^{op2}, @var{op1} raised to the power @var{op2}}.
  4935  @end deftypefun
  4936  
  4937  @deftypefun void mpf_neg (mpf_t @var{rop}, const mpf_t @var{op})
  4938  Set @var{rop} to @minus{}@var{op}.
  4939  @end deftypefun
  4940  
  4941  @deftypefun void mpf_abs (mpf_t @var{rop}, const mpf_t @var{op})
  4942  Set @var{rop} to the absolute value of @var{op}.
  4943  @end deftypefun
  4944  
  4945  @deftypefun void mpf_mul_2exp (mpf_t @var{rop}, const mpf_t @var{op1}, mp_bitcnt_t @var{op2})
  4946  Set @var{rop} to @m{@var{op1} \times 2^{op2}, @var{op1} times 2 raised to
  4947  @var{op2}}.
  4948  @end deftypefun
  4949  
  4950  @deftypefun void mpf_div_2exp (mpf_t @var{rop}, const mpf_t @var{op1}, mp_bitcnt_t @var{op2})
  4951  Set @var{rop} to @m{@var{op1}/2^{op2}, @var{op1} divided by 2 raised to
  4952  @var{op2}}.
  4953  @end deftypefun
  4954  
  4955  @node Float Comparison, I/O of Floats, Float Arithmetic, Floating-point Functions
  4956  @comment  node-name,  next,  previous,  up
  4957  @section Comparison Functions
  4958  @cindex Float comparison functions
  4959  @cindex Comparison functions
  4960  
  4961  @deftypefun int mpf_cmp (const mpf_t @var{op1}, const mpf_t @var{op2})
  4962  @deftypefunx int mpf_cmp_z (const mpf_t @var{op1}, const mpz_t @var{op2})
  4963  @deftypefunx int mpf_cmp_d (const mpf_t @var{op1}, double @var{op2})
  4964  @deftypefunx int mpf_cmp_ui (const mpf_t @var{op1}, unsigned long int @var{op2})
  4965  @deftypefunx int mpf_cmp_si (const mpf_t @var{op1}, signed long int @var{op2})
  4966  Compare @var{op1} and @var{op2}.  Return a positive value if @math{@var{op1} >
  4967  @var{op2}}, zero if @math{@var{op1} = @var{op2}}, and a negative value if
  4968  @math{@var{op1} < @var{op2}}.
  4969  
  4970  @code{mpf_cmp_d} can be called with an infinity, but results are undefined for
  4971  a NaN.
  4972  @end deftypefun
  4973  
  4974  @deftypefun int mpf_eq (const mpf_t @var{op1}, const mpf_t @var{op2}, mp_bitcnt_t op3)
  4975  @strong{This function is mathematically ill-defined and should not be used.}
  4976  
  4977  Return non-zero if the first @var{op3} bits of @var{op1} and @var{op2} are
  4978  equal, zero otherwise.  Note that numbers like e.g., 256 (binary 100000000) and
  4979  255 (binary 11111111) will never be equal by this function's measure, and
  4980  furthermore that 0 will only be equal to itself.
  4981  @end deftypefun
  4982  
  4983  @deftypefun void mpf_reldiff (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2})
  4984  Compute the relative difference between @var{op1} and @var{op2} and store the
  4985  result in @var{rop}.  This is @math{@GMPabs{@var{op1}-@var{op2}}/@var{op1}}.
  4986  @end deftypefun
  4987  
  4988  @deftypefn Macro int mpf_sgn (const mpf_t @var{op})
  4989  @cindex Sign tests
  4990  @cindex Float sign tests
  4991  Return @math{+1} if @math{@var{op} > 0}, 0 if @math{@var{op} = 0}, and
  4992  @math{-1} if @math{@var{op} < 0}.
  4993  
  4994  This function is actually implemented as a macro.  It evaluates its argument
  4995  multiple times.
  4996  @end deftypefn
  4997  
  4998  @node I/O of Floats, Miscellaneous Float Functions, Float Comparison, Floating-point Functions
  4999  @comment  node-name,  next,  previous,  up
  5000  @section Input and Output Functions
  5001  @cindex Float input and output functions
  5002  @cindex Input functions
  5003  @cindex Output functions
  5004  @cindex I/O functions
  5005  
  5006  Functions that perform input from a stdio stream, and functions that output to
  5007  a stdio stream, of @code{mpf} numbers.  Passing a @code{NULL} pointer for a
  5008  @var{stream} argument to any of these functions will make them read from
  5009  @code{stdin} and write to @code{stdout}, respectively.
  5010  
  5011  When using any of these functions, it is a good idea to include @file{stdio.h}
  5012  before @file{gmp.h}, since that will allow @file{gmp.h} to define prototypes
  5013  for these functions.
  5014  
  5015  See also @ref{Formatted Output} and @ref{Formatted Input}.
  5016  
  5017  @deftypefun size_t mpf_out_str (FILE *@var{stream}, int @var{base}, size_t @var{n_digits}, const mpf_t @var{op})
  5018  Print @var{op} to @var{stream}, as a string of digits.  Return the number of
  5019  bytes written, or if an error occurred, return 0.
  5020  
  5021  The mantissa is prefixed with an @samp{0.} and is in the given @var{base},
  5022  which may vary from 2 to 62 or from @minus{}2 to @minus{}36.  An exponent is
  5023  then printed, separated by an @samp{e}, or if the base is greater than 10 then
  5024  by an @samp{@@}.  The exponent is always in decimal.  The decimal point follows
  5025  the current locale, on systems providing @code{localeconv}.
  5026  
  5027  For @var{base} in the range 2..36, digits and lower-case letters are used; for
  5028  @minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62,
  5029  digits, upper-case letters, and lower-case letters (in that significance order)
  5030  are used.
  5031  
  5032  Up to @var{n_digits} will be printed from the mantissa, except that no more
  5033  digits than are accurately representable by @var{op} will be printed.
  5034  @var{n_digits} can be 0 to select that accurate maximum.
  5035  @end deftypefun
  5036  
  5037  @deftypefun size_t mpf_inp_str (mpf_t @var{rop}, FILE *@var{stream}, int @var{base})
  5038  Read a string in base @var{base} from @var{stream}, and put the read float in
  5039  @var{rop}.  The string is of the form @samp{M@@N} or, if the base is 10 or
  5040  less, alternatively @samp{MeN}.  @samp{M} is the mantissa and @samp{N} is the
  5041  exponent.  The mantissa is always in the specified base.  The exponent is
  5042  either in the specified base or, if @var{base} is negative, in decimal.  The
  5043  decimal point expected is taken from the current locale, on systems providing
  5044  @code{localeconv}.
  5045  
  5046  The argument @var{base} may be in the ranges 2 to 36, or @minus{}36 to
  5047  @minus{}2.  Negative values are used to specify that the exponent is in
  5048  decimal.
  5049  
  5050  Unlike the corresponding @code{mpz} function, the base will not be determined
  5051  from the leading characters of the string if @var{base} is 0.  This is so that
  5052  numbers like @samp{0.23} are not interpreted as octal.
  5053  
  5054  Return the number of bytes read, or if an error occurred, return 0.
  5055  @end deftypefun
  5056  
  5057  @c @deftypefun void mpf_out_raw (FILE *@var{stream}, const mpf_t @var{float})
  5058  @c Output @var{float} on stdio stream @var{stream}, in raw binary
  5059  @c format.  The float is written in a portable format, with 4 bytes of
  5060  @c size information, and that many bytes of limbs.  Both the size and the
  5061  @c limbs are written in decreasing significance order.
  5062  @c @end deftypefun
  5063  
  5064  @c @deftypefun void mpf_inp_raw (mpf_t @var{float}, FILE *@var{stream})
  5065  @c Input from stdio stream @var{stream} in the format written by
  5066  @c @code{mpf_out_raw}, and put the result in @var{float}.
  5067  @c @end deftypefun
  5068  
  5069  
  5070  @node Miscellaneous Float Functions,  , I/O of Floats, Floating-point Functions
  5071  @comment  node-name,  next,  previous,  up
  5072  @section Miscellaneous Functions
  5073  @cindex Miscellaneous float functions
  5074  @cindex Float miscellaneous functions
  5075  
  5076  @deftypefun void mpf_ceil (mpf_t @var{rop}, const mpf_t @var{op})
  5077  @deftypefunx void mpf_floor (mpf_t @var{rop}, const mpf_t @var{op})
  5078  @deftypefunx void mpf_trunc (mpf_t @var{rop}, const mpf_t @var{op})
  5079  @cindex Rounding functions
  5080  @cindex Float rounding functions
  5081  Set @var{rop} to @var{op} rounded to an integer.  @code{mpf_ceil} rounds to the
  5082  next higher integer, @code{mpf_floor} to the next lower, and @code{mpf_trunc}
  5083  to the integer towards zero.
  5084  @end deftypefun
  5085  
  5086  @deftypefun int mpf_integer_p (const mpf_t @var{op})
  5087  Return non-zero if @var{op} is an integer.
  5088  @end deftypefun
  5089  
  5090  @deftypefun int mpf_fits_ulong_p (const mpf_t @var{op})
  5091  @deftypefunx int mpf_fits_slong_p (const mpf_t @var{op})
  5092  @deftypefunx int mpf_fits_uint_p (const mpf_t @var{op})
  5093  @deftypefunx int mpf_fits_sint_p (const mpf_t @var{op})
  5094  @deftypefunx int mpf_fits_ushort_p (const mpf_t @var{op})
  5095  @deftypefunx int mpf_fits_sshort_p (const mpf_t @var{op})
  5096  Return non-zero if @var{op} would fit in the respective C data type, when
  5097  truncated to an integer.
  5098  @end deftypefun
  5099  
  5100  @deftypefun void mpf_urandomb (mpf_t @var{rop}, gmp_randstate_t @var{state}, mp_bitcnt_t @var{nbits})
  5101  @cindex Random number functions
  5102  @cindex Float random number functions
  5103  Generate a uniformly distributed random float in @var{rop}, such that @math{0
  5104  @le{} @var{rop} < 1}, with @var{nbits} significant bits in the mantissa or
  5105  less if the precision of @var{rop} is smaller.
  5106  
  5107  The variable @var{state} must be initialized by calling one of the
  5108  @code{gmp_randinit} functions (@ref{Random State Initialization}) before
  5109  invoking this function.
  5110  @end deftypefun
  5111  
  5112  @deftypefun void mpf_random2 (mpf_t @var{rop}, mp_size_t @var{max_size}, mp_exp_t @var{exp})
  5113  Generate a random float of at most @var{max_size} limbs, with long strings of
  5114  zeros and ones in the binary representation.  The exponent of the number is in
  5115  the interval @minus{}@var{exp} to @var{exp} (in limbs).  This function is
  5116  useful for testing functions and algorithms, since these kind of random
  5117  numbers have proven to be more likely to trigger corner-case bugs.  Negative
  5118  random numbers are generated when @var{max_size} is negative.
  5119  @end deftypefun
  5120  
  5121  @c @deftypefun size_t mpf_size (const mpf_t @var{op})
  5122  @c Return the size of @var{op} measured in number of limbs.  If @var{op} is
  5123  @c zero, the returned value will be zero.  (@xref{Nomenclature}, for an
  5124  @c explanation of the concept @dfn{limb}.)
  5125  @c
  5126  @c @strong{This function is obsolete.  It will disappear from future GMP
  5127  @c releases.}
  5128  @c @end deftypefun
  5129  
  5130  
  5131  @node Low-level Functions, Random Number Functions, Floating-point Functions, Top
  5132  @comment  node-name,  next,  previous,  up
  5133  @chapter Low-level Functions
  5134  @cindex Low-level functions
  5135  
  5136  This chapter describes low-level GMP functions, used to implement the
  5137  high-level GMP functions, but also intended for time-critical user code.
  5138  
  5139  These functions start with the prefix @code{mpn_}.
  5140  
  5141  @c 1. Some of these function clobber input operands.
  5142  @c
  5143  
  5144  The @code{mpn} functions are designed to be as fast as possible, @strong{not}
  5145  to provide a coherent calling interface.  The different functions have somewhat
  5146  similar interfaces, but there are variations that make them hard to use.  These
  5147  functions do as little as possible apart from the real multiple precision
  5148  computation, so that no time is spent on things that not all callers need.
  5149  
  5150  A source operand is specified by a pointer to the least significant limb and a
  5151  limb count.  A destination operand is specified by just a pointer.  It is the
  5152  responsibility of the caller to ensure that the destination has enough space
  5153  for storing the result.
  5154  
  5155  With this way of specifying operands, it is possible to perform computations on
  5156  subranges of an argument, and store the result into a subrange of a
  5157  destination.
  5158  
  5159  A common requirement for all functions is that each source area needs at least
  5160  one limb.  No size argument may be zero.  Unless otherwise stated, in-place
  5161  operations are allowed where source and destination are the same, but not where
  5162  they only partly overlap.
  5163  
  5164  The @code{mpn} functions are the base for the implementation of the
  5165  @code{mpz_}, @code{mpf_}, and @code{mpq_} functions.
  5166  
  5167  This example adds the number beginning at @var{s1p} and the number beginning at
  5168  @var{s2p} and writes the sum at @var{destp}.  All areas have @var{n} limbs.
  5169  
  5170  @example
  5171  cy = mpn_add_n (destp, s1p, s2p, n)
  5172  @end example
  5173  
  5174  It should be noted that the @code{mpn} functions make no attempt to identify
  5175  high or low zero limbs on their operands, or other special forms.  On random
  5176  data such cases will be unlikely and it'd be wasteful for every function to
  5177  check every time.  An application knowing something about its data can take
  5178  steps to trim or perhaps split its calculations.
  5179  @c
  5180  @c  For reference, within gmp mpz_t operands never have high zero limbs, and
  5181  @c  we rate low zero limbs as unlikely too (or something an application should
  5182  @c  handle).  This is a prime motivation for not stripping zero limbs in say
  5183  @c  mpn_mul_n etc.
  5184  @c
  5185  @c  Other applications doing variable-length calculations will quite likely do
  5186  @c  something similar to mpz.  And even if not then it's highly likely zero
  5187  @c  limb stripping can be done at just a few judicious points, which will be
  5188  @c  more efficient than having lots of mpn functions checking every time.
  5189  
  5190  @sp 1
  5191  @noindent
  5192  In the notation used below, a source operand is identified by the pointer to
  5193  the least significant limb, and the limb count in braces.  For example,
  5194  @{@var{s1p}, @var{s1n}@}.
  5195  
  5196  @deftypefun mp_limb_t mpn_add_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5197  Add @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@}, and write the @var{n}
  5198  least significant limbs of the result to @var{rp}.  Return carry, either 0 or
  5199  1.
  5200  
  5201  This is the lowest-level function for addition.  It is the preferred function
  5202  for addition, since it is written in assembly for most CPUs.  For addition of
  5203  a variable to itself (i.e., @var{s1p} equals @var{s2p}) use @code{mpn_lshift}
  5204  with a count of 1 for optimal speed.
  5205  @end deftypefun
  5206  
  5207  @deftypefun mp_limb_t mpn_add_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
  5208  Add @{@var{s1p}, @var{n}@} and @var{s2limb}, and write the @var{n} least
  5209  significant limbs of the result to @var{rp}.  Return carry, either 0 or 1.
  5210  @end deftypefun
  5211  
  5212  @deftypefun mp_limb_t mpn_add (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n})
  5213  Add @{@var{s1p}, @var{s1n}@} and @{@var{s2p}, @var{s2n}@}, and write the
  5214  @var{s1n} least significant limbs of the result to @var{rp}.  Return carry,
  5215  either 0 or 1.
  5216  
  5217  This function requires that @var{s1n} is greater than or equal to @var{s2n}.
  5218  @end deftypefun
  5219  
  5220  @deftypefun mp_limb_t mpn_sub_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5221  Subtract @{@var{s2p}, @var{n}@} from @{@var{s1p}, @var{n}@}, and write the
  5222  @var{n} least significant limbs of the result to @var{rp}.  Return borrow,
  5223  either 0 or 1.
  5224  
  5225  This is the lowest-level function for subtraction.  It is the preferred
  5226  function for subtraction, since it is written in assembly for most CPUs.
  5227  @end deftypefun
  5228  
  5229  @deftypefun mp_limb_t mpn_sub_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
  5230  Subtract @var{s2limb} from @{@var{s1p}, @var{n}@}, and write the @var{n} least
  5231  significant limbs of the result to @var{rp}.  Return borrow, either 0 or 1.
  5232  @end deftypefun
  5233  
  5234  @deftypefun mp_limb_t mpn_sub (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n})
  5235  Subtract @{@var{s2p}, @var{s2n}@} from @{@var{s1p}, @var{s1n}@}, and write the
  5236  @var{s1n} least significant limbs of the result to @var{rp}.  Return borrow,
  5237  either 0 or 1.
  5238  
  5239  This function requires that @var{s1n} is greater than or equal to
  5240  @var{s2n}.
  5241  @end deftypefun
  5242  
  5243  @deftypefun mp_limb_t mpn_neg (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n})
  5244  Perform the negation of @{@var{sp}, @var{n}@}, and write the result to
  5245  @{@var{rp}, @var{n}@}.  This is equivalent to calling @code{mpn_sub_n} with a
  5246  @var{n}-limb zero minuend and passing @{@var{sp}, @var{n}@} as subtrahend.
  5247  Return borrow, either 0 or 1.
  5248  @end deftypefun
  5249  
  5250  @deftypefun void mpn_mul_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5251  Multiply @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@}, and write the
  5252  2*@var{n}-limb result to @var{rp}.
  5253  
  5254  The destination has to have space for 2*@var{n} limbs, even if the product's
  5255  most significant limb is zero.  No overlap is permitted between the
  5256  destination and either source.
  5257  
  5258  If the two input operands are the same, use @code{mpn_sqr}.
  5259  @end deftypefun
  5260  
  5261  @deftypefun mp_limb_t mpn_mul (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n})
  5262  Multiply @{@var{s1p}, @var{s1n}@} and @{@var{s2p}, @var{s2n}@}, and write the
  5263  (@var{s1n}+@var{s2n})-limb result to @var{rp}.  Return the most significant
  5264  limb of the result.
  5265  
  5266  The destination has to have space for @var{s1n} + @var{s2n} limbs, even if the
  5267  product's most significant limb is zero.  No overlap is permitted between the
  5268  destination and either source.
  5269  
  5270  This function requires that @var{s1n} is greater than or equal to @var{s2n}.
  5271  @end deftypefun
  5272  
  5273  @deftypefun void mpn_sqr (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n})
  5274  Compute the square of @{@var{s1p}, @var{n}@} and write the 2*@var{n}-limb
  5275  result to @var{rp}.
  5276  
  5277  The destination has to have space for 2@var{n} limbs, even if the result's
  5278  most significant limb is zero.  No overlap is permitted between the
  5279  destination and the source.
  5280  @end deftypefun
  5281  
  5282  @deftypefun mp_limb_t mpn_mul_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
  5283  Multiply @{@var{s1p}, @var{n}@} by @var{s2limb}, and write the @var{n} least
  5284  significant limbs of the product to @var{rp}.  Return the most significant
  5285  limb of the product.  @{@var{s1p}, @var{n}@} and @{@var{rp}, @var{n}@} are
  5286  allowed to overlap provided @math{@var{rp} @le{} @var{s1p}}.
  5287  
  5288  This is a low-level function that is a building block for general
  5289  multiplication as well as other operations in GMP@.  It is written in assembly
  5290  for most CPUs.
  5291  
  5292  Don't call this function if @var{s2limb} is a power of 2; use @code{mpn_lshift}
  5293  with a count equal to the logarithm of @var{s2limb} instead, for optimal speed.
  5294  @end deftypefun
  5295  
  5296  @deftypefun mp_limb_t mpn_addmul_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
  5297  Multiply @{@var{s1p}, @var{n}@} and @var{s2limb}, and add the @var{n} least
  5298  significant limbs of the product to @{@var{rp}, @var{n}@} and write the result
  5299  to @var{rp}.  Return the most significant limb of the product, plus carry-out
  5300  from the addition.  @{@var{s1p}, @var{n}@} and @{@var{rp}, @var{n}@} are
  5301  allowed to overlap provided @math{@var{rp} @le{} @var{s1p}}.
  5302  
  5303  This is a low-level function that is a building block for general
  5304  multiplication as well as other operations in GMP@.  It is written in assembly
  5305  for most CPUs.
  5306  @end deftypefun
  5307  
  5308  @deftypefun mp_limb_t mpn_submul_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
  5309  Multiply @{@var{s1p}, @var{n}@} and @var{s2limb}, and subtract the @var{n}
  5310  least significant limbs of the product from @{@var{rp}, @var{n}@} and write the
  5311  result to @var{rp}.  Return the most significant limb of the product, plus
  5312  borrow-out from the subtraction.  @{@var{s1p}, @var{n}@} and @{@var{rp},
  5313  @var{n}@} are allowed to overlap provided @math{@var{rp} @le{} @var{s1p}}.
  5314  
  5315  This is a low-level function that is a building block for general
  5316  multiplication and division as well as other operations in GMP@.  It is written
  5317  in assembly for most CPUs.
  5318  @end deftypefun
  5319  
  5320  @deftypefun void mpn_tdiv_qr (mp_limb_t *@var{qp}, mp_limb_t *@var{rp}, mp_size_t @var{qxn}, const mp_limb_t *@var{np}, mp_size_t @var{nn}, const mp_limb_t *@var{dp}, mp_size_t @var{dn})
  5321  Divide @{@var{np}, @var{nn}@} by @{@var{dp}, @var{dn}@} and put the quotient
  5322  at @{@var{qp}, @var{nn}@minus{}@var{dn}+1@} and the remainder at @{@var{rp},
  5323  @var{dn}@}.  The quotient is rounded towards 0.
  5324  
  5325  No overlap is permitted between arguments, except that @var{np} might equal
  5326  @var{rp}.  The dividend size @var{nn} must be greater than or equal to divisor
  5327  size @var{dn}.  The most significant limb of the divisor must be non-zero.  The
  5328  @var{qxn} operand must be zero.
  5329  @end deftypefun
  5330  
  5331  @deftypefun mp_limb_t mpn_divrem (mp_limb_t *@var{r1p}, mp_size_t @var{qxn}, mp_limb_t *@var{rs2p}, mp_size_t @var{rs2n}, const mp_limb_t *@var{s3p}, mp_size_t @var{s3n})
  5332  [This function is obsolete.  Please call @code{mpn_tdiv_qr} instead for best
  5333  performance.]
  5334  
  5335  Divide @{@var{rs2p}, @var{rs2n}@} by @{@var{s3p}, @var{s3n}@}, and write the
  5336  quotient at @var{r1p}, with the exception of the most significant limb, which
  5337  is returned.  The remainder replaces the dividend at @var{rs2p}; it will be
  5338  @var{s3n} limbs long (i.e., as many limbs as the divisor).
  5339  
  5340  In addition to an integer quotient, @var{qxn} fraction limbs are developed, and
  5341  stored after the integral limbs.  For most usages, @var{qxn} will be zero.
  5342  
  5343  It is required that @var{rs2n} is greater than or equal to @var{s3n}.  It is
  5344  required that the most significant bit of the divisor is set.
  5345  
  5346  If the quotient is not needed, pass @var{rs2p} + @var{s3n} as @var{r1p}.  Aside
  5347  from that special case, no overlap between arguments is permitted.
  5348  
  5349  Return the most significant limb of the quotient, either 0 or 1.
  5350  
  5351  The area at @var{r1p} needs to be @var{rs2n} @minus{} @var{s3n} + @var{qxn}
  5352  limbs large.
  5353  @end deftypefun
  5354  
  5355  @deftypefn Function mp_limb_t mpn_divrem_1 (mp_limb_t *@var{r1p}, mp_size_t @var{qxn}, @w{mp_limb_t *@var{s2p}}, mp_size_t @var{s2n}, mp_limb_t @var{s3limb})
  5356  @deftypefnx Macro mp_limb_t mpn_divmod_1 (mp_limb_t *@var{r1p}, mp_limb_t *@var{s2p}, @w{mp_size_t @var{s2n}}, @w{mp_limb_t @var{s3limb}})
  5357  Divide @{@var{s2p}, @var{s2n}@} by @var{s3limb}, and write the quotient at
  5358  @var{r1p}.  Return the remainder.
  5359  
  5360  The integer quotient is written to @{@var{r1p}+@var{qxn}, @var{s2n}@} and in
  5361  addition @var{qxn} fraction limbs are developed and written to @{@var{r1p},
  5362  @var{qxn}@}.  Either or both @var{s2n} and @var{qxn} can be zero.  For most
  5363  usages, @var{qxn} will be zero.
  5364  
  5365  @code{mpn_divmod_1} exists for upward source compatibility and is simply a
  5366  macro calling @code{mpn_divrem_1} with a @var{qxn} of 0.
  5367  
  5368  The areas at @var{r1p} and @var{s2p} have to be identical or completely
  5369  separate, not partially overlapping.
  5370  @end deftypefn
  5371  
  5372  @deftypefun mp_limb_t mpn_divmod (mp_limb_t *@var{r1p}, mp_limb_t *@var{rs2p}, mp_size_t @var{rs2n}, const mp_limb_t *@var{s3p}, mp_size_t @var{s3n})
  5373  [This function is obsolete.  Please call @code{mpn_tdiv_qr} instead for best
  5374  performance.]
  5375  @end deftypefun
  5376  
  5377  @deftypefun void mpn_divexact_1 (mp_limb_t * @var{rp}, const mp_limb_t * @var{sp}, mp_size_t @var{n}, mp_limb_t @var{d})
  5378  Divide @{@var{sp}, @var{n}@} by @var{d}, expecting it to divide exactly, and
  5379  writing the result to @{@var{rp}, @var{n}@}. If @var{d} doesn't divide
  5380  exactly, the value written to @{@var{rp}, @var{n}@} is undefined. The areas at
  5381  @var{rp} and @var{sp} have to be identical or completely separate, not
  5382  partially overlapping.
  5383  @end deftypefun
  5384  
  5385  @deftypefn Macro mp_limb_t mpn_divexact_by3 (mp_limb_t *@var{rp}, mp_limb_t *@var{sp}, @w{mp_size_t @var{n}})
  5386  @deftypefnx Function mp_limb_t mpn_divexact_by3c (mp_limb_t *@var{rp}, mp_limb_t *@var{sp}, @w{mp_size_t @var{n}}, mp_limb_t @var{carry})
  5387  Divide @{@var{sp}, @var{n}@} by 3, expecting it to divide exactly, and writing
  5388  the result to @{@var{rp}, @var{n}@}.  If 3 divides exactly, the return value is
  5389  zero and the result is the quotient.  If not, the return value is non-zero and
  5390  the result won't be anything useful.
  5391  
  5392  @code{mpn_divexact_by3c} takes an initial carry parameter, which can be the
  5393  return value from a previous call, so a large calculation can be done piece by
  5394  piece from low to high.  @code{mpn_divexact_by3} is simply a macro calling
  5395  @code{mpn_divexact_by3c} with a 0 carry parameter.
  5396  
  5397  These routines use a multiply-by-inverse and will be faster than
  5398  @code{mpn_divrem_1} on CPUs with fast multiplication but slow division.
  5399  
  5400  The source @math{a}, result @math{q}, size @math{n}, initial carry @math{i},
  5401  and return value @math{c} satisfy @m{cb^n+a-i=3q, c*b^n + a-i = 3*q}, where
  5402  @m{b=2\GMPraise{@code{GMP\_NUMB\_BITS}}, b=2^GMP_NUMB_BITS}.  The
  5403  return @math{c} is always 0, 1 or 2, and the initial carry @math{i} must also
  5404  be 0, 1 or 2 (these are both borrows really).  When @math{c=0} clearly
  5405  @math{q=(a-i)/3}.  When @m{c \neq 0, c!=0}, the remainder @math{(a-i) @bmod{}
  5406  3} is given by @math{3-c}, because @math{b @equiv{} 1 @bmod{} 3} (when
  5407  @code{mp_bits_per_limb} is even, which is always so currently).
  5408  @end deftypefn
  5409  
  5410  @deftypefun mp_limb_t mpn_mod_1 (const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, mp_limb_t @var{s2limb})
  5411  Divide @{@var{s1p}, @var{s1n}@} by @var{s2limb}, and return the remainder.
  5412  @var{s1n} can be zero.
  5413  @end deftypefun
  5414  
  5415  @deftypefun mp_limb_t mpn_lshift (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n}, unsigned int @var{count})
  5416  Shift @{@var{sp}, @var{n}@} left by @var{count} bits, and write the result to
  5417  @{@var{rp}, @var{n}@}.  The bits shifted out at the left are returned in the
  5418  least significant @var{count} bits of the return value (the rest of the return
  5419  value is zero).
  5420  
  5421  @var{count} must be in the range 1 to @nicode{mp_bits_per_limb}@minus{}1.  The
  5422  regions @{@var{sp}, @var{n}@} and @{@var{rp}, @var{n}@} may overlap, provided
  5423  @math{@var{rp} @ge{} @var{sp}}.
  5424  
  5425  This function is written in assembly for most CPUs.
  5426  @end deftypefun
  5427  
  5428  @deftypefun mp_limb_t mpn_rshift (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n}, unsigned int @var{count})
  5429  Shift @{@var{sp}, @var{n}@} right by @var{count} bits, and write the result to
  5430  @{@var{rp}, @var{n}@}.  The bits shifted out at the right are returned in the
  5431  most significant @var{count} bits of the return value (the rest of the return
  5432  value is zero).
  5433  
  5434  @var{count} must be in the range 1 to @nicode{mp_bits_per_limb}@minus{}1.  The
  5435  regions @{@var{sp}, @var{n}@} and @{@var{rp}, @var{n}@} may overlap, provided
  5436  @math{@var{rp} @le{} @var{sp}}.
  5437  
  5438  This function is written in assembly for most CPUs.
  5439  @end deftypefun
  5440  
  5441  @deftypefun int mpn_cmp (const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5442  Compare @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@} and return a
  5443  positive value if @math{@var{s1} > @var{s2}}, 0 if they are equal, or a
  5444  negative value if @math{@var{s1} < @var{s2}}.
  5445  @end deftypefun
  5446  
  5447  @deftypefun int mpn_zero_p (const mp_limb_t *@var{sp}, mp_size_t @var{n})
  5448  Test @{@var{sp}, @var{n}@} and return 1 if the operand is zero, 0 otherwise.
  5449  @end deftypefun
  5450  
  5451  @deftypefun mp_size_t mpn_gcd (mp_limb_t *@var{rp}, mp_limb_t *@var{xp}, mp_size_t @var{xn}, mp_limb_t *@var{yp}, mp_size_t @var{yn})
  5452  Set @{@var{rp}, @var{retval}@} to the greatest common divisor of @{@var{xp},
  5453  @var{xn}@} and @{@var{yp}, @var{yn}@}.  The result can be up to @var{yn} limbs,
  5454  the return value is the actual number produced.  Both source operands are
  5455  destroyed.
  5456  
  5457  It is required that @math{@var{xn} @ge @var{yn} > 0}, and the most significant
  5458  limb of @{@var{yp}, @var{yn}@} must be non-zero.  No overlap is permitted
  5459  between @{@var{xp}, @var{xn}@} and @{@var{yp}, @var{yn}@}.
  5460  @end deftypefun
  5461  
  5462  @deftypefun mp_limb_t mpn_gcd_1 (const mp_limb_t *@var{xp}, mp_size_t @var{xn}, mp_limb_t @var{ylimb})
  5463  Return the greatest common divisor of @{@var{xp}, @var{xn}@} and @var{ylimb}.
  5464  Both operands must be non-zero.
  5465  @end deftypefun
  5466  
  5467  @deftypefun mp_size_t mpn_gcdext (mp_limb_t *@var{gp}, mp_limb_t *@var{sp}, mp_size_t *@var{sn}, mp_limb_t *@var{up}, mp_size_t @var{un}, mp_limb_t *@var{vp}, mp_size_t @var{vn})
  5468  Let @m{U,@var{U}} be defined by @{@var{up}, @var{un}@} and let @m{V,@var{V}} be
  5469  defined by @{@var{vp}, @var{vn}@}.
  5470  
  5471  Compute the greatest common divisor @math{G} of @math{U} and @math{V}.  Compute
  5472  a cofactor @math{S} such that @math{G = US + VT}.  The second cofactor @var{T}
  5473  is not computed but can easily be obtained from @m{(G - US) / V, (@var{G} -
  5474  @var{U}*@var{S}) / @var{V}} (the division will be exact).  It is required that
  5475  @math{@var{un} @ge @var{vn} > 0}, and the most significant
  5476  limb of @{@var{vp}, @var{vn}@} must be non-zero.
  5477  
  5478  @math{S} satisfies @math{S = 1} or @math{@GMPabs{S} < V / (2 G)}. @math{S =
  5479  0} if and only if @math{V} divides @math{U} (i.e., @math{G = V}).
  5480  
  5481  Store @math{G} at @var{gp} and let the return value define its limb count.
  5482  Store @math{S} at @var{sp} and let |*@var{sn}| define its limb count.  @math{S}
  5483  can be negative; when this happens *@var{sn} will be negative.  The area at
  5484  @var{gp} should have room for @var{vn} limbs and the area at @var{sp} should
  5485  have room for @math{@var{vn}+1} limbs.
  5486  
  5487  Both source operands are destroyed.
  5488  
  5489  Compatibility notes: GMP 4.3.0 and 4.3.1 defined @math{S} less strictly.
  5490  Earlier as well as later GMP releases define @math{S} as described here.
  5491  GMP releases before GMP 4.3.0 required additional space for both input and output
  5492  areas. More precisely, the areas @{@var{up}, @math{@var{un}+1}@} and
  5493  @{@var{vp}, @math{@var{vn}+1}@} were destroyed (i.e.@: the operands plus an
  5494  extra limb past the end of each), and the areas pointed to by @var{gp} and
  5495  @var{sp} should each have room for @math{@var{un}+1} limbs.
  5496  @end deftypefun
  5497  
  5498  @deftypefun mp_size_t mpn_sqrtrem (mp_limb_t *@var{r1p}, mp_limb_t *@var{r2p}, const mp_limb_t *@var{sp}, mp_size_t @var{n})
  5499  Compute the square root of @{@var{sp}, @var{n}@} and put the result at
  5500  @{@var{r1p}, @math{@GMPceil{@var{n}/2}}@} and the remainder at @{@var{r2p},
  5501  @var{retval}@}.  @var{r2p} needs space for @var{n} limbs, but the return value
  5502  indicates how many are produced.
  5503  
  5504  The most significant limb of @{@var{sp}, @var{n}@} must be non-zero.  The
  5505  areas @{@var{r1p}, @math{@GMPceil{@var{n}/2}}@} and @{@var{sp}, @var{n}@} must
  5506  be completely separate.  The areas @{@var{r2p}, @var{n}@} and @{@var{sp},
  5507  @var{n}@} must be either identical or completely separate.
  5508  
  5509  If the remainder is not wanted then @var{r2p} can be @code{NULL}, and in this
  5510  case the return value is zero or non-zero according to whether the remainder
  5511  would have been zero or non-zero.
  5512  
  5513  A return value of zero indicates a perfect square.  See also
  5514  @code{mpn_perfect_square_p}.
  5515  @end deftypefun
  5516  
  5517  @deftypefun size_t mpn_sizeinbase (const mp_limb_t *@var{xp}, mp_size_t @var{n}, int @var{base})
  5518  Return the size of @{@var{xp},@var{n}@} measured in number of digits in the
  5519  given @var{base}.  @var{base} can vary from 2 to 62.  Requires @math{@var{n} > 0}
  5520  and @math{@var{xp}[@var{n}-1] > 0}.  The result will be either exact or
  5521  1 too big.  If @var{base} is a power of 2, the result is always exact.
  5522  @end deftypefun
  5523  
  5524  @deftypefun mp_size_t mpn_get_str (unsigned char *@var{str}, int @var{base}, mp_limb_t *@var{s1p}, mp_size_t @var{s1n})
  5525  Convert @{@var{s1p}, @var{s1n}@} to a raw unsigned char array at @var{str} in
  5526  base @var{base}, and return the number of characters produced.  There may be
  5527  leading zeros in the string.  The string is not in ASCII; to convert it to
  5528  printable format, add the ASCII codes for @samp{0} or @samp{A}, depending on
  5529  the base and range.  @var{base} can vary from 2 to 256.
  5530  
  5531  The most significant limb of the input @{@var{s1p}, @var{s1n}@} must be
  5532  non-zero.  The input @{@var{s1p}, @var{s1n}@} is clobbered, except when
  5533  @var{base} is a power of 2, in which case it's unchanged.
  5534  
  5535  The area at @var{str} has to have space for the largest possible number
  5536  represented by a @var{s1n} long limb array, plus one extra character.
  5537  @end deftypefun
  5538  
  5539  @deftypefun mp_size_t mpn_set_str (mp_limb_t *@var{rp}, const unsigned char *@var{str}, size_t @var{strsize}, int @var{base})
  5540  Convert bytes @{@var{str},@var{strsize}@} in the given @var{base} to limbs at
  5541  @var{rp}.
  5542  
  5543  @math{@var{str}[0]} is the most significant input byte and
  5544  @math{@var{str}[@var{strsize}-1]} is the least significant input byte.  Each
  5545  byte should be a value in the range 0 to @math{@var{base}-1}, not an ASCII
  5546  character.  @var{base} can vary from 2 to 256.
  5547  
  5548  The converted value is @{@var{rp},@var{rn}@} where @var{rn} is the return
  5549  value.  If the most significant input byte @math{@var{str}[0]} is non-zero,
  5550  then @math{@var{rp}[@var{rn}-1]} will be non-zero, else
  5551  @math{@var{rp}[@var{rn}-1]} and some number of subsequent limbs may be zero.
  5552  
  5553  The area at @var{rp} has to have space for the largest possible number with
  5554  @var{strsize} digits in the chosen base, plus one extra limb.
  5555  
  5556  The input must have at least one byte, and no overlap is permitted between
  5557  @{@var{str},@var{strsize}@} and the result at @var{rp}.
  5558  @end deftypefun
  5559  
  5560  @deftypefun {mp_bitcnt_t} mpn_scan0 (const mp_limb_t *@var{s1p}, mp_bitcnt_t @var{bit})
  5561  Scan @var{s1p} from bit position @var{bit} for the next clear bit.
  5562  
  5563  It is required that there be a clear bit within the area at @var{s1p} at or
  5564  beyond bit position @var{bit}, so that the function has something to return.
  5565  @end deftypefun
  5566  
  5567  @deftypefun {mp_bitcnt_t} mpn_scan1 (const mp_limb_t *@var{s1p}, mp_bitcnt_t @var{bit})
  5568  Scan @var{s1p} from bit position @var{bit} for the next set bit.
  5569  
  5570  It is required that there be a set bit within the area at @var{s1p} at or
  5571  beyond bit position @var{bit}, so that the function has something to return.
  5572  @end deftypefun
  5573  
  5574  @deftypefun void mpn_random (mp_limb_t *@var{r1p}, mp_size_t @var{r1n})
  5575  @deftypefunx void mpn_random2 (mp_limb_t *@var{r1p}, mp_size_t @var{r1n})
  5576  Generate a random number of length @var{r1n} and store it at @var{r1p}.  The
  5577  most significant limb is always non-zero.  @code{mpn_random} generates
  5578  uniformly distributed limb data, @code{mpn_random2} generates long strings of
  5579  zeros and ones in the binary representation.
  5580  
  5581  @code{mpn_random2} is intended for testing the correctness of the @code{mpn}
  5582  routines.
  5583  @end deftypefun
  5584  
  5585  @deftypefun {mp_bitcnt_t} mpn_popcount (const mp_limb_t *@var{s1p}, mp_size_t @var{n})
  5586  Count the number of set bits in @{@var{s1p}, @var{n}@}.
  5587  @end deftypefun
  5588  
  5589  @deftypefun {mp_bitcnt_t} mpn_hamdist (const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5590  Compute the hamming distance between @{@var{s1p}, @var{n}@} and @{@var{s2p},
  5591  @var{n}@}, which is the number of bit positions where the two operands have
  5592  different bit values.
  5593  @end deftypefun
  5594  
  5595  @deftypefun int mpn_perfect_square_p (const mp_limb_t *@var{s1p}, mp_size_t @var{n})
  5596  Return non-zero iff @{@var{s1p}, @var{n}@} is a perfect square.
  5597  The most significant limb of the input @{@var{s1p}, @var{n}@} must be
  5598  non-zero.
  5599  @end deftypefun
  5600  
  5601  @deftypefun void mpn_and_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5602  Perform the bitwise logical and of @{@var{s1p}, @var{n}@} and @{@var{s2p},
  5603  @var{n}@}, and write the result to @{@var{rp}, @var{n}@}.
  5604  @end deftypefun
  5605  
  5606  @deftypefun void mpn_ior_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5607  Perform the bitwise logical inclusive or of @{@var{s1p}, @var{n}@} and
  5608  @{@var{s2p}, @var{n}@}, and write the result to @{@var{rp}, @var{n}@}.
  5609  @end deftypefun
  5610  
  5611  @deftypefun void mpn_xor_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5612  Perform the bitwise logical exclusive or of @{@var{s1p}, @var{n}@} and
  5613  @{@var{s2p}, @var{n}@}, and write the result to @{@var{rp}, @var{n}@}.
  5614  @end deftypefun
  5615  
  5616  @deftypefun void mpn_andn_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5617  Perform the bitwise logical and of @{@var{s1p}, @var{n}@} and the bitwise
  5618  complement of @{@var{s2p}, @var{n}@}, and write the result to @{@var{rp}, @var{n}@}.
  5619  @end deftypefun
  5620  
  5621  @deftypefun void mpn_iorn_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5622  Perform the bitwise logical inclusive or of @{@var{s1p}, @var{n}@} and the bitwise
  5623  complement of @{@var{s2p}, @var{n}@}, and write the result to @{@var{rp}, @var{n}@}.
  5624  @end deftypefun
  5625  
  5626  @deftypefun void mpn_nand_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5627  Perform the bitwise logical and of @{@var{s1p}, @var{n}@} and @{@var{s2p},
  5628  @var{n}@}, and write the bitwise complement of the result to @{@var{rp}, @var{n}@}.
  5629  @end deftypefun
  5630  
  5631  @deftypefun void mpn_nior_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5632  Perform the bitwise logical inclusive or of @{@var{s1p}, @var{n}@} and
  5633  @{@var{s2p}, @var{n}@}, and write the bitwise complement of the result to
  5634  @{@var{rp}, @var{n}@}.
  5635  @end deftypefun
  5636  
  5637  @deftypefun void mpn_xnor_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5638  Perform the bitwise logical exclusive or of @{@var{s1p}, @var{n}@} and
  5639  @{@var{s2p}, @var{n}@}, and write the bitwise complement of the result to
  5640  @{@var{rp}, @var{n}@}.
  5641  @end deftypefun
  5642  
  5643  @deftypefun void mpn_com (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n})
  5644  Perform the bitwise complement of @{@var{sp}, @var{n}@}, and write the result
  5645  to @{@var{rp}, @var{n}@}.
  5646  @end deftypefun
  5647  
  5648  @deftypefun void mpn_copyi (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n})
  5649  Copy from @{@var{s1p}, @var{n}@} to @{@var{rp}, @var{n}@}, increasingly.
  5650  @end deftypefun
  5651  
  5652  @deftypefun void mpn_copyd (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n})
  5653  Copy from @{@var{s1p}, @var{n}@} to @{@var{rp}, @var{n}@}, decreasingly.
  5654  @end deftypefun
  5655  
  5656  @deftypefun void mpn_zero (mp_limb_t *@var{rp}, mp_size_t @var{n})
  5657  Zero @{@var{rp}, @var{n}@}.
  5658  @end deftypefun
  5659  
  5660  @sp 1
  5661  @section Low-level functions for cryptography
  5662  @cindex Low-level functions for cryptography
  5663  @cindex Cryptography functions, low-level
  5664  
  5665  The functions prefixed with @code{mpn_sec_} and @code{mpn_cnd_} are designed to
  5666  perform the exact same low-level operations and have the same cache access
  5667  patterns for any two same-size arguments, assuming that function arguments are
  5668  placed at the same position and that the machine state is identical upon
  5669  function entry.  These functions are intended for cryptographic purposes, where
  5670  resilience to side-channel attacks is desired.
  5671  
  5672  These functions are less efficient than their ``leaky'' counterparts; their
  5673  performance for operands of the sizes typically used for cryptographic
  5674  applications is between 15% and 100% worse.  For larger operands, these
  5675  functions might be inadequate, since they rely on asymptotically elementary
  5676  algorithms.
  5677  
  5678  These functions do not make any explicit allocations.  Those of these functions
  5679  that need scratch space accept a scratch space operand.  This convention allows
  5680  callers to keep sensitive data in designated memory areas.  Note however that
  5681  compilers may choose to spill scalar values used within these functions to
  5682  their stack frame and that such scalars may contain sensitive data.
  5683  
  5684  In addition to these specially crafted functions, the following @code{mpn}
  5685  functions are naturally side-channel resistant: @code{mpn_add_n},
  5686  @code{mpn_sub_n}, @code{mpn_lshift}, @code{mpn_rshift}, @code{mpn_zero},
  5687  @code{mpn_copyi}, @code{mpn_copyd}, @code{mpn_com}, and the logical function
  5688  (@code{mpn_and_n}, etc).
  5689  
  5690  There are some exceptions from the side-channel resilience: (1) Some assembly
  5691  implementations of @code{mpn_lshift} identify shift-by-one as a special case.
  5692  This is a problem iff the shift count is a function of sensitive data.  (2)
  5693  Alpha ev6 and Pentium4 using 64-bit limbs have leaky @code{mpn_add_n} and
  5694  @code{mpn_sub_n}.  (3) Alpha ev6 has a leaky @code{mpn_mul_1} which also makes
  5695  @code{mpn_sec_mul} on those systems unsafe.
  5696  
  5697  @deftypefun mp_limb_t mpn_cnd_add_n (mp_limb_t @var{cnd}, mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5698  @deftypefunx mp_limb_t mpn_cnd_sub_n (mp_limb_t @var{cnd}, mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
  5699  These functions do conditional addition and subtraction.  If @var{cnd} is
  5700  non-zero, they produce the same result as a regular @code{mpn_add_n} or
  5701  @code{mpn_sub_n}, and if @var{cnd} is zero, they copy @{@var{s1p},@var{n}@} to
  5702  the result area and return zero.  The functions are designed to have timing and
  5703  memory access patterns depending only on size and location of the data areas,
  5704  but independent of the condition @var{cnd}.  Like for @code{mpn_add_n} and
  5705  @code{mpn_sub_n}, on most machines, the timing will also be independent of the
  5706  actual limb values.
  5707  @end deftypefun
  5708  
  5709  @deftypefun mp_limb_t mpn_sec_add_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{ap}, mp_size_t @var{n}, mp_limb_t @var{b}, mp_limb_t *@var{tp})
  5710  @deftypefunx mp_limb_t mpn_sec_sub_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{ap}, mp_size_t @var{n}, mp_limb_t @var{b}, mp_limb_t *@var{tp})
  5711  Set @var{R} to @var{A} + @var{b} or @var{A} - @var{b}, respectively, where
  5712  @var{R} = @{@var{rp},@var{n}@}, @var{A} = @{@var{ap},@var{n}@}, and @var{b} is
  5713  a single limb. Returns carry.
  5714  
  5715  These functions take @math{O(N)} time, unlike the leaky functions
  5716  @code{mpn_add_1} which are @math{O(1)} on average. They require scratch space
  5717  of @code{mpn_sec_add_1_itch(@var{n})} and @code{mpn_sec_sub_1_itch(@var{n})}
  5718  limbs, respectively, to be passed in the @var{tp} parameter. The scratch space
  5719  requirements are guaranteed to be at most @var{n} limbs, and increase
  5720  monotonously in the operand size.
  5721  @end deftypefun
  5722  
  5723  @deftypefun void mpn_cnd_swap (mp_limb_t @var{cnd}, volatile mp_limb_t *@var{ap}, volatile mp_limb_t *@var{bp}, mp_size_t @var{n})
  5724  If @var{cnd} is non-zero, swaps the contents of the areas @{@var{ap},@var{n}@}
  5725  and @{@var{bp},@var{n}@}. Otherwise, the areas are left unmodified.
  5726  Implemented using logical operations on the limbs, with the same memory
  5727  accesses independent of the value of @var{cnd}.
  5728  @end deftypefun
  5729  
  5730  @deftypefun void mpn_sec_mul (mp_limb_t *@var{rp}, const mp_limb_t *@var{ap}, mp_size_t @var{an}, const mp_limb_t *@var{bp}, mp_size_t @var{bn}, mp_limb_t *@var{tp})
  5731  @deftypefunx mp_size_t mpn_sec_mul_itch (mp_size_t @var{an}, mp_size_t @var{bn})
  5732  Set @var{R} to @math{A @times{} B}, where @var{A} = @{@var{ap},@var{an}@},
  5733  @var{B} = @{@var{bp},@var{bn}@}, and @var{R} =
  5734  @{@var{rp},@math{@var{an}+@var{bn}}@}.
  5735  
  5736  It is required that @math{@var{an} @ge @var{bn} > 0}.
  5737  
  5738  No overlapping between @var{R} and the input operands is allowed.  For
  5739  @math{@var{A} = @var{B}}, use @code{mpn_sec_sqr} for optimal performance.
  5740  
  5741  This function requires scratch space of @code{mpn_sec_mul_itch(@var{an},
  5742  @var{bn})} limbs to be passed in the @var{tp} parameter.  The scratch space
  5743  requirements are guaranteed to increase monotonously in the operand sizes.
  5744  @end deftypefun
  5745  
  5746  
  5747  @deftypefun void mpn_sec_sqr (mp_limb_t *@var{rp}, const mp_limb_t *@var{ap}, mp_size_t @var{an}, mp_limb_t *@var{tp})
  5748  @deftypefunx mp_size_t mpn_sec_sqr_itch (mp_size_t @var{an})
  5749  Set @var{R} to @math{A^2}, where @var{A} = @{@var{ap},@var{an}@}, and @var{R} =
  5750  @{@var{rp},@math{2@var{an}}@}.
  5751  
  5752  It is required that @math{@var{an} > 0}.
  5753  
  5754  No overlapping between @var{R} and the input operands is allowed.
  5755  
  5756  This function requires scratch space of @code{mpn_sec_sqr_itch(@var{an})} limbs
  5757  to be passed in the @var{tp} parameter.  The scratch space requirements are
  5758  guaranteed to increase monotonously in the operand size.
  5759  @end deftypefun
  5760  
  5761  
  5762  @deftypefun void mpn_sec_powm (mp_limb_t *@var{rp}, const mp_limb_t *@var{bp}, mp_size_t @var{bn}, const mp_limb_t *@var{ep}, mp_bitcnt_t @var{enb},  const mp_limb_t *@var{mp}, mp_size_t @var{n}, mp_limb_t *@var{tp})
  5763  @deftypefunx mp_size_t mpn_sec_powm_itch (mp_size_t @var{bn}, mp_bitcnt_t @var{enb}, size_t @var{n})
  5764  Set @var{R} to @m{B^E \bmod @var{M}, (@var{B} raised to @var{E}) modulo
  5765  @var{M}}, where @var{R} = @{@var{rp},@var{n}@}, @var{M} = @{@var{mp},@var{n}@},
  5766  and @var{E} = @{@var{ep},@math{@GMPceil{@var{enb} /
  5767  @code{GMP\_NUMB\_BITS}}}@}.
  5768  
  5769  It is required that @math{@var{B} > 0}, that @math{@var{M} > 0} is odd, and
  5770  that @m{@var{E} < 2@GMPraise{@var{enb}}, @var{E} < 2^@var{enb}}.
  5771  
  5772  No overlapping between @var{R} and the input operands is allowed.
  5773  
  5774  This function requires scratch space of @code{mpn_sec_powm_itch(@var{bn},
  5775  @var{enb}, @var{n})} limbs to be passed in the @var{tp} parameter.  The scratch
  5776  space requirements are guaranteed to increase monotonously in the operand
  5777  sizes.
  5778  @end deftypefun
  5779  
  5780  @deftypefun void mpn_sec_tabselect (mp_limb_t *@var{rp}, const mp_limb_t *@var{tab}, mp_size_t @var{n}, mp_size_t @var{nents}, mp_size_t @var{which})
  5781  Select entry @var{which} from table @var{tab}, which has @var{nents} entries, each @var{n}
  5782  limbs.  Store the selected entry at @var{rp}.
  5783  
  5784  This function reads the entire table to avoid side-channel information leaks.
  5785  @end deftypefun
  5786  
  5787  @deftypefun mp_limb_t mpn_sec_div_qr (mp_limb_t *@var{qp}, mp_limb_t *@var{np}, mp_size_t @var{nn}, const mp_limb_t *@var{dp}, mp_size_t @var{dn}, mp_limb_t *@var{tp})
  5788  @deftypefunx mp_size_t mpn_sec_div_qr_itch (mp_size_t @var{nn}, mp_size_t @var{dn})
  5789  
  5790  Set @var{Q} to @m{\lfloor @var{N} / @var{D}\rfloor, the truncated quotient
  5791  @var{N} / @var{D}} and @var{R} to @m{@var{N} \bmod @var{D}, @var{N} modulo
  5792  @var{D}}, where @var{N} = @{@var{np},@var{nn}@}, @var{D} =
  5793  @{@var{dp},@var{dn}@}, @var{Q}'s most significant limb is the function return
  5794  value and the remaining limbs are @{@var{qp},@var{nn-dn}@}, and @var{R} =
  5795  @{@var{np},@var{dn}@}.
  5796  
  5797  It is required that @math{@var{nn} @ge @var{dn} @ge 1}, and that
  5798  @m{@var{dp}[@var{dn}-1] @neq 0, @var{dp}[@var{dn}-1] != 0}.  This does not
  5799  imply that @math{@var{N} @ge @var{D}} since @var{N} might be zero-padded.
  5800  
  5801  Note the overlapping between @var{N} and @var{R}.  No other operand overlapping
  5802  is allowed.  The entire space occupied by @var{N} is overwritten.
  5803  
  5804  This function requires scratch space of @code{mpn_sec_div_qr_itch(@var{nn},
  5805  @var{dn})} limbs to be passed in the @var{tp} parameter.
  5806  @end deftypefun
  5807  
  5808  @deftypefun void mpn_sec_div_r (mp_limb_t *@var{np}, mp_size_t @var{nn}, const mp_limb_t *@var{dp}, mp_size_t @var{dn}, mp_limb_t *@var{tp})
  5809  @deftypefunx mp_size_t mpn_sec_div_r_itch (mp_size_t @var{nn}, mp_size_t @var{dn})
  5810  
  5811  Set @var{R} to @m{@var{N} \bmod @var{D}, @var{N} modulo @var{D}}, where @var{N}
  5812  = @{@var{np},@var{nn}@}, @var{D} = @{@var{dp},@var{dn}@}, and @var{R} =
  5813  @{@var{np},@var{dn}@}.
  5814  
  5815  It is required that @math{@var{nn} @ge @var{dn} @ge 1}, and that
  5816  @m{@var{dp}[@var{dn}-1] @neq 0, @var{dp}[@var{dn}-1] != 0}.  This does not
  5817  imply that @math{@var{N} @ge @var{D}} since @var{N} might be zero-padded.
  5818  
  5819  Note the overlapping between @var{N} and @var{R}.  No other operand overlapping
  5820  is allowed.  The entire space occupied by @var{N} is overwritten.
  5821  
  5822  This function requires scratch space of @code{mpn_sec_div_r_itch(@var{nn},
  5823  @var{dn})} limbs to be passed in the @var{tp} parameter.
  5824  @end deftypefun
  5825  
  5826  @deftypefun int mpn_sec_invert (mp_limb_t *@var{rp}, mp_limb_t *@var{ap}, const mp_limb_t *@var{mp}, mp_size_t @var{n}, mp_bitcnt_t @var{nbcnt}, mp_limb_t *@var{tp})
  5827  @deftypefunx mp_size_t mpn_sec_invert_itch (mp_size_t @var{n})
  5828  Set @var{R} to @m{@var{A}^{-1} \bmod @var{M}, the inverse of @var{A} modulo
  5829  @var{M}}, where @var{R} = @{@var{rp},@var{n}@}, @var{A} = @{@var{ap},@var{n}@},
  5830  and @var{M} = @{@var{mp},@var{n}@}.  @strong{This function's interface is
  5831  preliminary.}
  5832  
  5833  If an inverse exists, return 1, otherwise return 0 and leave @var{R}
  5834  undefined. In either case, the input @var{A} is destroyed.
  5835  
  5836  It is required that @var{M} is odd, and that @math{@var{nbcnt} @ge
  5837  @GMPceil{\log(@var{A}+1)} + @GMPceil{\log(@var{M}+1)}}.  A safe choice is
  5838  @m{@var{nbcnt} = 2@var{n} @times{} @code{GMP\_NUMB\_BITS}, @var{nbcnt} = 2
  5839  @times{} @var{n} @times{} GMP_NUMB_BITS}, but a smaller value might improve
  5840  performance if @var{M} or @var{A} are known to have leading zero bits.
  5841  
  5842  This function requires scratch space of @code{mpn_sec_invert_itch(@var{n})}
  5843  limbs to be passed in the @var{tp} parameter.
  5844  @end deftypefun
  5845  
  5846  
  5847  @sp 1
  5848  @section Nails
  5849  @cindex Nails
  5850  
  5851  @strong{Everything in this section is highly experimental and may disappear or
  5852  be subject to incompatible changes in a future version of GMP.}
  5853  
  5854  Nails are an experimental feature whereby a few bits are left unused at the
  5855  top of each @code{mp_limb_t}.  This can significantly improve carry handling
  5856  on some processors.
  5857  
  5858  All the @code{mpn} functions accepting limb data will expect the nail bits to
  5859  be zero on entry, and will return data with the nails similarly all zero.
  5860  This applies both to limb vectors and to single limb arguments.
  5861  
  5862  Nails can be enabled by configuring with @samp{--enable-nails}.  By default
  5863  the number of bits will be chosen according to what suits the host processor,
  5864  but a particular number can be selected with @samp{--enable-nails=N}.
  5865  
  5866  At the mpn level, a nail build is neither source nor binary compatible with a
  5867  non-nail build, strictly speaking.  But programs acting on limbs only through
  5868  the mpn functions are likely to work equally well with either build, and
  5869  judicious use of the definitions below should make any program compatible with
  5870  either build, at the source level.
  5871  
  5872  For the higher level routines, meaning @code{mpz} etc, a nail build should be
  5873  fully source and binary compatible with a non-nail build.
  5874  
  5875  @defmac GMP_NAIL_BITS
  5876  @defmacx GMP_NUMB_BITS
  5877  @defmacx GMP_LIMB_BITS
  5878  @code{GMP_NAIL_BITS} is the number of nail bits, or 0 when nails are not in
  5879  use.  @code{GMP_NUMB_BITS} is the number of data bits in a limb.
  5880  @code{GMP_LIMB_BITS} is the total number of bits in an @code{mp_limb_t}.  In
  5881  all cases
  5882  
  5883  @example
  5884  GMP_LIMB_BITS == GMP_NAIL_BITS + GMP_NUMB_BITS
  5885  @end example
  5886  @end defmac
  5887  
  5888  @defmac GMP_NAIL_MASK
  5889  @defmacx GMP_NUMB_MASK
  5890  Bit masks for the nail and number parts of a limb.  @code{GMP_NAIL_MASK} is 0
  5891  when nails are not in use.
  5892  
  5893  @code{GMP_NAIL_MASK} is not often needed, since the nail part can be obtained
  5894  with @code{x >> GMP_NUMB_BITS}, and that means one less large constant, which
  5895  can help various RISC chips.
  5896  @end defmac
  5897  
  5898  @defmac GMP_NUMB_MAX
  5899  The maximum value that can be stored in the number part of a limb.  This is
  5900  the same as @code{GMP_NUMB_MASK}, but can be used for clarity when doing
  5901  comparisons rather than bit-wise operations.
  5902  @end defmac
  5903  
  5904  The term ``nails'' comes from finger or toe nails, which are at the ends of a
  5905  limb (arm or leg).  ``numb'' is short for number, but is also how the
  5906  developers felt after trying for a long time to come up with sensible names
  5907  for these things.
  5908  
  5909  In the future (the distant future most likely) a non-zero nail might be
  5910  permitted, giving non-unique representations for numbers in a limb vector.
  5911  This would help vector processors since carries would only ever need to
  5912  propagate one or two limbs.
  5913  
  5914  
  5915  @node Random Number Functions, Formatted Output, Low-level Functions, Top
  5916  @chapter Random Number Functions
  5917  @cindex Random number functions
  5918  
  5919  Sequences of pseudo-random numbers in GMP are generated using a variable of
  5920  type @code{gmp_randstate_t}, which holds an algorithm selection and a current
  5921  state.  Such a variable must be initialized by a call to one of the
  5922  @code{gmp_randinit} functions, and can be seeded with one of the
  5923  @code{gmp_randseed} functions.
  5924  
  5925  The functions actually generating random numbers are described in @ref{Integer
  5926  Random Numbers}, and @ref{Miscellaneous Float Functions}.
  5927  
  5928  The older style random number functions don't accept a @code{gmp_randstate_t}
  5929  parameter but instead share a global variable of that type.  They use a
  5930  default algorithm and are currently not seeded (though perhaps that will
  5931  change in the future).  The new functions accepting a @code{gmp_randstate_t}
  5932  are recommended for applications that care about randomness.
  5933  
  5934  @menu
  5935  * Random State Initialization::
  5936  * Random State Seeding::
  5937  * Random State Miscellaneous::
  5938  @end menu
  5939  
  5940  @node Random State Initialization, Random State Seeding, Random Number Functions, Random Number Functions
  5941  @section Random State Initialization
  5942  @cindex Random number state
  5943  @cindex Initialization functions
  5944  
  5945  @deftypefun void gmp_randinit_default (gmp_randstate_t @var{state})
  5946  Initialize @var{state} with a default algorithm.  This will be a compromise
  5947  between speed and randomness, and is recommended for applications with no
  5948  special requirements.  Currently this is @code{gmp_randinit_mt}.
  5949  @end deftypefun
  5950  
  5951  @deftypefun void gmp_randinit_mt (gmp_randstate_t @var{state})
  5952  @cindex Mersenne twister random numbers
  5953  Initialize @var{state} for a Mersenne Twister algorithm.  This algorithm is
  5954  fast and has good randomness properties.
  5955  @end deftypefun
  5956  
  5957  @deftypefun void gmp_randinit_lc_2exp (gmp_randstate_t @var{state}, const mpz_t @var{a}, @w{unsigned long @var{c}}, @w{mp_bitcnt_t @var{m2exp}})
  5958  @cindex Linear congruential random numbers
  5959  Initialize @var{state} with a linear congruential algorithm @m{X = (@var{a}X +
  5960  @var{c}) @bmod 2^{m2exp}, X = (@var{a}*X + @var{c}) mod 2^@var{m2exp}}.
  5961  
  5962  The low bits of @math{X} in this algorithm are not very random.  The least
  5963  significant bit will have a period no more than 2, and the second bit no more
  5964  than 4, etc.  For this reason only the high half of each @math{X} is actually
  5965  used.
  5966  
  5967  When a random number of more than @math{@var{m2exp}/2} bits is to be
  5968  generated, multiple iterations of the recurrence are used and the results
  5969  concatenated.
  5970  @end deftypefun
  5971  
  5972  @deftypefun int gmp_randinit_lc_2exp_size (gmp_randstate_t @var{state}, mp_bitcnt_t @var{size})
  5973  @cindex Linear congruential random numbers
  5974  Initialize @var{state} for a linear congruential algorithm as per
  5975  @code{gmp_randinit_lc_2exp}.  @var{a}, @var{c} and @var{m2exp} are selected
  5976  from a table, chosen so that @var{size} bits (or more) of each @math{X} will
  5977  be used, i.e.@: @math{@var{m2exp}/2 @ge{} @var{size}}.
  5978  
  5979  If successful the return value is non-zero.  If @var{size} is bigger than the
  5980  table data provides then the return value is zero.  The maximum @var{size}
  5981  currently supported is 128.
  5982  @end deftypefun
  5983  
  5984  @deftypefun void gmp_randinit_set (gmp_randstate_t @var{rop}, gmp_randstate_t @var{op})
  5985  Initialize @var{rop} with a copy of the algorithm and state from @var{op}.
  5986  @end deftypefun
  5987  
  5988  @c  Although gmp_randinit, gmp_errno and related constants are obsolete, we
  5989  @c  still put @findex entries for them, since they're still documented and
  5990  @c  someone might be looking them up when perusing old application code.
  5991  
  5992  @deftypefun void gmp_randinit (gmp_randstate_t @var{state}, @w{gmp_randalg_t @var{alg}}, @dots{})
  5993  @strong{This function is obsolete.}
  5994  
  5995  @findex GMP_RAND_ALG_LC
  5996  @findex GMP_RAND_ALG_DEFAULT
  5997  Initialize @var{state} with an algorithm selected by @var{alg}.  The only
  5998  choice is @code{GMP_RAND_ALG_LC}, which is @code{gmp_randinit_lc_2exp_size}
  5999  described above.  A third parameter of type @code{unsigned long} is required,
  6000  this is the @var{size} for that function.  @code{GMP_RAND_ALG_DEFAULT} or 0
  6001  are the same as @code{GMP_RAND_ALG_LC}.
  6002  
  6003  @c  For reference, this is the only place gmp_errno has been documented, and
  6004  @c  due to being non thread safe we won't be adding to it's uses.
  6005  @findex gmp_errno
  6006  @findex GMP_ERROR_UNSUPPORTED_ARGUMENT
  6007  @findex GMP_ERROR_INVALID_ARGUMENT
  6008  @code{gmp_randinit} sets bits in the global variable @code{gmp_errno} to
  6009  indicate an error.  @code{GMP_ERROR_UNSUPPORTED_ARGUMENT} if @var{alg} is
  6010  unsupported, or @code{GMP_ERROR_INVALID_ARGUMENT} if the @var{size} parameter
  6011  is too big.  It may be noted this error reporting is not thread safe (a good
  6012  reason to use @code{gmp_randinit_lc_2exp_size} instead).
  6013  @end deftypefun
  6014  
  6015  @deftypefun void gmp_randclear (gmp_randstate_t @var{state})
  6016  Free all memory occupied by @var{state}.
  6017  @end deftypefun
  6018  
  6019  
  6020  @node Random State Seeding, Random State Miscellaneous, Random State Initialization, Random Number Functions
  6021  @section Random State Seeding
  6022  @cindex Random number seeding
  6023  @cindex Seeding random numbers
  6024  
  6025  @deftypefun void gmp_randseed (gmp_randstate_t @var{state}, const mpz_t @var{seed})
  6026  @deftypefunx void gmp_randseed_ui (gmp_randstate_t @var{state}, @w{unsigned long int @var{seed}})
  6027  Set an initial seed value into @var{state}.
  6028  
  6029  The size of a seed determines how many different sequences of random numbers
  6030  that it's possible to generate.  The ``quality'' of the seed is the randomness
  6031  of a given seed compared to the previous seed used, and this affects the
  6032  randomness of separate number sequences.  The method for choosing a seed is
  6033  critical if the generated numbers are to be used for important applications,
  6034  such as generating cryptographic keys.
  6035  
  6036  Traditionally the system time has been used to seed, but care needs to be
  6037  taken with this.  If an application seeds often and the resolution of the
  6038  system clock is low, then the same sequence of numbers might be repeated.
  6039  Also, the system time is quite easy to guess, so if unpredictability is
  6040  required then it should definitely not be the only source for the seed value.
  6041  On some systems there's a special device @file{/dev/random} which provides
  6042  random data better suited for use as a seed.
  6043  @end deftypefun
  6044  
  6045  
  6046  @node Random State Miscellaneous,  , Random State Seeding, Random Number Functions
  6047  @section Random State Miscellaneous
  6048  
  6049  @deftypefun {unsigned long} gmp_urandomb_ui (gmp_randstate_t @var{state}, unsigned long @var{n})
  6050  Return a uniformly distributed random number of @var{n} bits, i.e.@: in the
  6051  range 0 to @m{2^n-1,2^@var{n}-1} inclusive.  @var{n} must be less than or
  6052  equal to the number of bits in an @code{unsigned long}.
  6053  @end deftypefun
  6054  
  6055  @deftypefun {unsigned long} gmp_urandomm_ui (gmp_randstate_t @var{state}, unsigned long @var{n})
  6056  Return a uniformly distributed random number in the range 0 to
  6057  @math{@var{n}-1}, inclusive.
  6058  @end deftypefun
  6059  
  6060  
  6061  @node Formatted Output, Formatted Input, Random Number Functions, Top
  6062  @chapter Formatted Output
  6063  @cindex Formatted output
  6064  @cindex @code{printf} formatted output
  6065  
  6066  @menu
  6067  * Formatted Output Strings::
  6068  * Formatted Output Functions::
  6069  * C++ Formatted Output::
  6070  @end menu
  6071  
  6072  @node Formatted Output Strings, Formatted Output Functions, Formatted Output, Formatted Output
  6073  @section Format Strings
  6074  
  6075  @code{gmp_printf} and friends accept format strings similar to the standard C
  6076  @code{printf} (@pxref{Formatted Output,, Formatted Output, libc, The GNU C
  6077  Library Reference Manual}).  A format specification is of the form
  6078  
  6079  @example
  6080  % [flags] [width] [.[precision]] [type] conv
  6081  @end example
  6082  
  6083  GMP adds types @samp{Z}, @samp{Q} and @samp{F} for @code{mpz_t}, @code{mpq_t}
  6084  and @code{mpf_t} respectively, @samp{M} for @code{mp_limb_t}, and @samp{N} for
  6085  an @code{mp_limb_t} array.  @samp{Z}, @samp{Q}, @samp{M} and @samp{N} behave
  6086  like integers.  @samp{Q} will print a @samp{/} and a denominator, if needed.
  6087  @samp{F} behaves like a float.  For example,
  6088  
  6089  @example
  6090  mpz_t z;
  6091  gmp_printf ("%s is an mpz %Zd\n", "here", z);
  6092  
  6093  mpq_t q;
  6094  gmp_printf ("a hex rational: %#40Qx\n", q);
  6095  
  6096  mpf_t f;
  6097  int   n;
  6098  gmp_printf ("fixed point mpf %.*Ff with %d digits\n", n, f, n);
  6099  
  6100  mp_limb_t l;
  6101  gmp_printf ("limb %Mu\n", l);
  6102  
  6103  const mp_limb_t *ptr;
  6104  mp_size_t       size;
  6105  gmp_printf ("limb array %Nx\n", ptr, size);
  6106  @end example
  6107  
  6108  For @samp{N} the limbs are expected least significant first, as per the
  6109  @code{mpn} functions (@pxref{Low-level Functions}).  A negative size can be
  6110  given to print the value as a negative.
  6111  
  6112  All the standard C @code{printf} types behave the same as the C library
  6113  @code{printf}, and can be freely intermixed with the GMP extensions.  In the
  6114  current implementation the standard parts of the format string are simply
  6115  handed to @code{printf} and only the GMP extensions handled directly.
  6116  
  6117  The flags accepted are as follows.  GLIBC style @nisamp{'} is only for the
  6118  standard C types (not the GMP types), and only if the C library supports it.
  6119  
  6120  @quotation
  6121  @multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
  6122  @item @nicode{0} @tab pad with zeros (rather than spaces)
  6123  @item @nicode{#} @tab show the base with @samp{0x}, @samp{0X} or @samp{0}
  6124  @item @nicode{+} @tab always show a sign
  6125  @item (space)    @tab show a space or a @samp{-} sign
  6126  @item @nicode{'} @tab group digits, GLIBC style (not GMP types)
  6127  @end multitable
  6128  @end quotation
  6129  
  6130  The optional width and precision can be given as a number within the format
  6131  string, or as a @samp{*} to take an extra parameter of type @code{int}, the
  6132  same as the standard @code{printf}.
  6133  
  6134  The standard types accepted are as follows.  @samp{h} and @samp{l} are
  6135  portable, the rest will depend on the compiler (or include files) for the type
  6136  and the C library for the output.
  6137  
  6138  @quotation
  6139  @multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
  6140  @item @nicode{h}  @tab @nicode{short}
  6141  @item @nicode{hh} @tab @nicode{char}
  6142  @item @nicode{j}  @tab @nicode{intmax_t} or @nicode{uintmax_t}
  6143  @item @nicode{l}  @tab @nicode{long} or @nicode{wchar_t}
  6144  @item @nicode{ll} @tab @nicode{long long}
  6145  @item @nicode{L}  @tab @nicode{long double}
  6146  @item @nicode{q}  @tab @nicode{quad_t} or @nicode{u_quad_t}
  6147  @item @nicode{t}  @tab @nicode{ptrdiff_t}
  6148  @item @nicode{z}  @tab @nicode{size_t}
  6149  @end multitable
  6150  @end quotation
  6151  
  6152  @noindent
  6153  The GMP types are
  6154  
  6155  @quotation
  6156  @multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
  6157  @item @nicode{F}  @tab @nicode{mpf_t}, float conversions
  6158  @item @nicode{Q}  @tab @nicode{mpq_t}, integer conversions
  6159  @item @nicode{M}  @tab @nicode{mp_limb_t}, integer conversions
  6160  @item @nicode{N}  @tab @nicode{mp_limb_t} array, integer conversions
  6161  @item @nicode{Z}  @tab @nicode{mpz_t}, integer conversions
  6162  @end multitable
  6163  @end quotation
  6164  
  6165  The conversions accepted are as follows.  @samp{a} and @samp{A} are always
  6166  supported for @code{mpf_t} but depend on the C library for standard C float
  6167  types.  @samp{m} and @samp{p} depend on the C library.
  6168  
  6169  @quotation
  6170  @multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
  6171  @item @nicode{a} @nicode{A} @tab hex floats, C99 style
  6172  @item @nicode{c}            @tab character
  6173  @item @nicode{d}            @tab decimal integer
  6174  @item @nicode{e} @nicode{E} @tab scientific format float
  6175  @item @nicode{f}            @tab fixed point float
  6176  @item @nicode{i}            @tab same as @nicode{d}
  6177  @item @nicode{g} @nicode{G} @tab fixed or scientific float
  6178  @item @nicode{m}            @tab @code{strerror} string, GLIBC style
  6179  @item @nicode{n}            @tab store characters written so far
  6180  @item @nicode{o}            @tab octal integer
  6181  @item @nicode{p}            @tab pointer
  6182  @item @nicode{s}            @tab string
  6183  @item @nicode{u}            @tab unsigned integer
  6184  @item @nicode{x} @nicode{X} @tab hex integer
  6185  @end multitable
  6186  @end quotation
  6187  
  6188  @samp{o}, @samp{x} and @samp{X} are unsigned for the standard C types, but for
  6189  types @samp{Z}, @samp{Q} and @samp{N} they are signed.  @samp{u} is not
  6190  meaningful for @samp{Z}, @samp{Q} and @samp{N}.
  6191  
  6192  @samp{M} is a proxy for the C library @samp{l} or @samp{L}, according to the
  6193  size of @code{mp_limb_t}.  Unsigned conversions will be usual, but a signed
  6194  conversion can be used and will interpret the value as a twos complement
  6195  negative.
  6196  
  6197  @samp{n} can be used with any type, even the GMP types.
  6198  
  6199  Other types or conversions that might be accepted by the C library
  6200  @code{printf} cannot be used through @code{gmp_printf}, this includes for
  6201  instance extensions registered with GLIBC @code{register_printf_function}.
  6202  Also currently there's no support for POSIX @samp{$} style numbered arguments
  6203  (perhaps this will be added in the future).
  6204  
  6205  The precision field has its usual meaning for integer @samp{Z} and float
  6206  @samp{F} types, but is currently undefined for @samp{Q} and should not be used
  6207  with that.
  6208  
  6209  @code{mpf_t} conversions only ever generate as many digits as can be
  6210  accurately represented by the operand, the same as @code{mpf_get_str} does.
  6211  Zeros will be used if necessary to pad to the requested precision.  This
  6212  happens even for an @samp{f} conversion of an @code{mpf_t} which is an
  6213  integer, for instance @math{2^@W{1024}} in an @code{mpf_t} of 128 bits
  6214  precision will only produce about 40 digits, then pad with zeros to the
  6215  decimal point.  An empty precision field like @samp{%.Fe} or @samp{%.Ff} can
  6216  be used to specifically request just the significant digits.  Without any dot
  6217  and thus no precision field, a precision value of 6 will be used.  Note that
  6218  these rules mean that @samp{%Ff}, @samp{%.Ff}, and @samp{%.0Ff} will all be
  6219  different.
  6220  
  6221  The decimal point character (or string) is taken from the current locale
  6222  settings on systems which provide @code{localeconv} (@pxref{Locales,, Locales
  6223  and Internationalization, libc, The GNU C Library Reference Manual}).  The C
  6224  library will normally do the same for standard float output.
  6225  
  6226  The format string is only interpreted as plain @code{char}s, multibyte
  6227  characters are not recognised.  Perhaps this will change in the future.
  6228  
  6229  
  6230  @node Formatted Output Functions, C++ Formatted Output, Formatted Output Strings, Formatted Output
  6231  @section Functions
  6232  @cindex Output functions
  6233  
  6234  Each of the following functions is similar to the corresponding C library
  6235  function.  The basic @code{printf} forms take a variable argument list.  The
  6236  @code{vprintf} forms take an argument pointer, see @ref{Variadic Functions,,
  6237  Variadic Functions, libc, The GNU C Library Reference Manual}, or @samp{man 3
  6238  va_start}.
  6239  
  6240  It should be emphasised that if a format string is invalid, or the arguments
  6241  don't match what the format specifies, then the behaviour of any of these
  6242  functions will be unpredictable.  GCC format string checking is not available,
  6243  since it doesn't recognise the GMP extensions.
  6244  
  6245  The file based functions @code{gmp_printf} and @code{gmp_fprintf} will return
  6246  @math{-1} to indicate a write error.  Output is not ``atomic'', so partial
  6247  output may be produced if a write error occurs.  All the functions can return
  6248  @math{-1} if the C library @code{printf} variant in use returns @math{-1}, but
  6249  this shouldn't normally occur.
  6250  
  6251  @deftypefun int gmp_printf (const char *@var{fmt}, @dots{})
  6252  @deftypefunx int gmp_vprintf (const char *@var{fmt}, va_list @var{ap})
  6253  Print to the standard output @code{stdout}.  Return the number of characters
  6254  written, or @math{-1} if an error occurred.
  6255  @end deftypefun
  6256  
  6257  @deftypefun int gmp_fprintf (FILE *@var{fp}, const char *@var{fmt}, @dots{})
  6258  @deftypefunx int gmp_vfprintf (FILE *@var{fp}, const char *@var{fmt}, va_list @var{ap})
  6259  Print to the stream @var{fp}.  Return the number of characters written, or
  6260  @math{-1} if an error occurred.
  6261  @end deftypefun
  6262  
  6263  @deftypefun int gmp_sprintf (char *@var{buf}, const char *@var{fmt}, @dots{})
  6264  @deftypefunx int gmp_vsprintf (char *@var{buf}, const char *@var{fmt}, va_list @var{ap})
  6265  Form a null-terminated string in @var{buf}.  Return the number of characters
  6266  written, excluding the terminating null.
  6267  
  6268  No overlap is permitted between the space at @var{buf} and the string
  6269  @var{fmt}.
  6270  
  6271  These functions are not recommended, since there's no protection against
  6272  exceeding the space available at @var{buf}.
  6273  @end deftypefun
  6274  
  6275  @deftypefun int gmp_snprintf (char *@var{buf}, size_t @var{size}, const char *@var{fmt}, @dots{})
  6276  @deftypefunx int gmp_vsnprintf (char *@var{buf}, size_t @var{size}, const char *@var{fmt}, va_list @var{ap})
  6277  Form a null-terminated string in @var{buf}.  No more than @var{size} bytes
  6278  will be written.  To get the full output, @var{size} must be enough for the
  6279  string and null-terminator.
  6280  
  6281  The return value is the total number of characters which ought to have been
  6282  produced, excluding the terminating null.  If @math{@var{retval} @ge{}
  6283  @var{size}} then the actual output has been truncated to the first
  6284  @math{@var{size}-1} characters, and a null appended.
  6285  
  6286  No overlap is permitted between the region @{@var{buf},@var{size}@} and the
  6287  @var{fmt} string.
  6288  
  6289  Notice the return value is in ISO C99 @code{snprintf} style.  This is so even
  6290  if the C library @code{vsnprintf} is the older GLIBC 2.0.x style.
  6291  @end deftypefun
  6292  
  6293  @deftypefun int gmp_asprintf (char **@var{pp}, const char *@var{fmt}, @dots{})
  6294  @deftypefunx int gmp_vasprintf (char **@var{pp}, const char *@var{fmt}, va_list @var{ap})
  6295  Form a null-terminated string in a block of memory obtained from the current
  6296  memory allocation function (@pxref{Custom Allocation}).  The block will be the
  6297  size of the string and null-terminator.  The address of the block in stored to
  6298  *@var{pp}.  The return value is the number of characters produced, excluding
  6299  the null-terminator.
  6300  
  6301  Unlike the C library @code{asprintf}, @code{gmp_asprintf} doesn't return
  6302  @math{-1} if there's no more memory available, it lets the current allocation
  6303  function handle that.
  6304  @end deftypefun
  6305  
  6306  @deftypefun int gmp_obstack_printf (struct obstack *@var{ob}, const char *@var{fmt}, @dots{})
  6307  @deftypefunx int gmp_obstack_vprintf (struct obstack *@var{ob}, const char *@var{fmt}, va_list @var{ap})
  6308  @cindex @code{obstack} output
  6309  Append to the current object in @var{ob}.  The return value is the number of
  6310  characters written.  A null-terminator is not written.
  6311  
  6312  @var{fmt} cannot be within the current object in @var{ob}, since that object
  6313  might move as it grows.
  6314  
  6315  These functions are available only when the C library provides the obstack
  6316  feature, which probably means only on GNU systems, see @ref{Obstacks,,
  6317  Obstacks, libc, The GNU C Library Reference Manual}.
  6318  @end deftypefun
  6319  
  6320  
  6321  @node C++ Formatted Output,  , Formatted Output Functions, Formatted Output
  6322  @section C++ Formatted Output
  6323  @cindex C++ @code{ostream} output
  6324  @cindex @code{ostream} output
  6325  
  6326  The following functions are provided in @file{libgmpxx} (@pxref{Headers and
  6327  Libraries}), which is built if C++ support is enabled (@pxref{Build Options}).
  6328  Prototypes are available from @code{<gmp.h>}.
  6329  
  6330  @deftypefun ostream& operator<< (ostream& @var{stream}, const mpz_t @var{op})
  6331  Print @var{op} to @var{stream}, using its @code{ios} formatting settings.
  6332  @code{ios::width} is reset to 0 after output, the same as the standard
  6333  @code{ostream operator<<} routines do.
  6334  
  6335  In hex or octal, @var{op} is printed as a signed number, the same as for
  6336  decimal.  This is unlike the standard @code{operator<<} routines on @code{int}
  6337  etc, which instead give twos complement.
  6338  @end deftypefun
  6339  
  6340  @deftypefun ostream& operator<< (ostream& @var{stream}, const mpq_t @var{op})
  6341  Print @var{op} to @var{stream}, using its @code{ios} formatting settings.
  6342  @code{ios::width} is reset to 0 after output, the same as the standard
  6343  @code{ostream operator<<} routines do.
  6344  
  6345  Output will be a fraction like @samp{5/9}, or if the denominator is 1 then
  6346  just a plain integer like @samp{123}.
  6347  
  6348  In hex or octal, @var{op} is printed as a signed value, the same as for
  6349  decimal.  If @code{ios::showbase} is set then a base indicator is shown on
  6350  both the numerator and denominator (if the denominator is required).
  6351  @end deftypefun
  6352  
  6353  @deftypefun ostream& operator<< (ostream& @var{stream}, const mpf_t @var{op})
  6354  Print @var{op} to @var{stream}, using its @code{ios} formatting settings.
  6355  @code{ios::width} is reset to 0 after output, the same as the standard
  6356  @code{ostream operator<<} routines do.
  6357  
  6358  The decimal point follows the standard library float @code{operator<<}, which
  6359  on recent systems means the @code{std::locale} imbued on @var{stream}.
  6360  
  6361  Hex and octal are supported, unlike the standard @code{operator<<} on
  6362  @code{double}.  The mantissa will be in hex or octal, the exponent will be in
  6363  decimal.  For hex the exponent delimiter is an @samp{@@}.  This is as per
  6364  @code{mpf_out_str}.
  6365  
  6366  @code{ios::showbase} is supported, and will put a base on the mantissa, for
  6367  example hex @samp{0x1.8} or @samp{0x0.8}, or octal @samp{01.4} or @samp{00.4}.
  6368  This last form is slightly strange, but at least differentiates itself from
  6369  decimal.
  6370  @end deftypefun
  6371  
  6372  These operators mean that GMP types can be printed in the usual C++ way, for
  6373  example,
  6374  
  6375  @example
  6376  mpz_t  z;
  6377  int    n;
  6378  ...
  6379  cout << "iteration " << n << " value " << z << "\n";
  6380  @end example
  6381  
  6382  But note that @code{ostream} output (and @code{istream} input, @pxref{C++
  6383  Formatted Input}) is the only overloading available for the GMP types and that
  6384  for instance using @code{+} with an @code{mpz_t} will have unpredictable
  6385  results.  For classes with overloading, see @ref{C++ Class Interface}.
  6386  
  6387  
  6388  @node Formatted Input, C++ Class Interface, Formatted Output, Top
  6389  @chapter Formatted Input
  6390  @cindex Formatted input
  6391  @cindex @code{scanf} formatted input
  6392  
  6393  @menu
  6394  * Formatted Input Strings::
  6395  * Formatted Input Functions::
  6396  * C++ Formatted Input::
  6397  @end menu
  6398  
  6399  
  6400  @node Formatted Input Strings, Formatted Input Functions, Formatted Input, Formatted Input
  6401  @section Formatted Input Strings
  6402  
  6403  @code{gmp_scanf} and friends accept format strings similar to the standard C
  6404  @code{scanf} (@pxref{Formatted Input,, Formatted Input, libc, The GNU C
  6405  Library Reference Manual}).  A format specification is of the form
  6406  
  6407  @example
  6408  % [flags] [width] [type] conv
  6409  @end example
  6410  
  6411  GMP adds types @samp{Z}, @samp{Q} and @samp{F} for @code{mpz_t}, @code{mpq_t}
  6412  and @code{mpf_t} respectively.  @samp{Z} and @samp{Q} behave like integers.
  6413  @samp{Q} will read a @samp{/} and a denominator, if present.  @samp{F} behaves
  6414  like a float.
  6415  
  6416  GMP variables don't require an @code{&} when passed to @code{gmp_scanf}, since
  6417  they're already ``call-by-reference''.  For example,
  6418  
  6419  @example
  6420  /* to read say "a(5) = 1234" */
  6421  int   n;
  6422  mpz_t z;
  6423  gmp_scanf ("a(%d) = %Zd\n", &n, z);
  6424  
  6425  mpq_t q1, q2;
  6426  gmp_sscanf ("0377 + 0x10/0x11", "%Qi + %Qi", q1, q2);
  6427  
  6428  /* to read say "topleft (1.55,-2.66)" */
  6429  mpf_t x, y;
  6430  char  buf[32];
  6431  gmp_scanf ("%31s (%Ff,%Ff)", buf, x, y);
  6432  @end example
  6433  
  6434  All the standard C @code{scanf} types behave the same as in the C library
  6435  @code{scanf}, and can be freely intermixed with the GMP extensions.  In the
  6436  current implementation the standard parts of the format string are simply
  6437  handed to @code{scanf} and only the GMP extensions handled directly.
  6438  
  6439  The flags accepted are as follows.  @samp{a} and @samp{'} will depend on
  6440  support from the C library, and @samp{'} cannot be used with GMP types.
  6441  
  6442  @quotation
  6443  @multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
  6444  @item @nicode{*} @tab read but don't store
  6445  @item @nicode{a} @tab allocate a buffer (string conversions)
  6446  @item @nicode{'} @tab grouped digits, GLIBC style (not GMP types)
  6447  @end multitable
  6448  @end quotation
  6449  
  6450  The standard types accepted are as follows.  @samp{h} and @samp{l} are
  6451  portable, the rest will depend on the compiler (or include files) for the type
  6452  and the C library for the input.
  6453  
  6454  @quotation
  6455  @multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
  6456  @item @nicode{h}  @tab @nicode{short}
  6457  @item @nicode{hh} @tab @nicode{char}
  6458  @item @nicode{j}  @tab @nicode{intmax_t} or @nicode{uintmax_t}
  6459  @item @nicode{l}  @tab @nicode{long int}, @nicode{double} or @nicode{wchar_t}
  6460  @item @nicode{ll} @tab @nicode{long long}
  6461  @item @nicode{L}  @tab @nicode{long double}
  6462  @item @nicode{q}  @tab @nicode{quad_t} or @nicode{u_quad_t}
  6463  @item @nicode{t}  @tab @nicode{ptrdiff_t}
  6464  @item @nicode{z}  @tab @nicode{size_t}
  6465  @end multitable
  6466  @end quotation
  6467  
  6468  @noindent
  6469  The GMP types are
  6470  
  6471  @quotation
  6472  @multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
  6473  @item @nicode{F}  @tab @nicode{mpf_t}, float conversions
  6474  @item @nicode{Q}  @tab @nicode{mpq_t}, integer conversions
  6475  @item @nicode{Z}  @tab @nicode{mpz_t}, integer conversions
  6476  @end multitable
  6477  @end quotation
  6478  
  6479  The conversions accepted are as follows.  @samp{p} and @samp{[} will depend on
  6480  support from the C library, the rest are standard.
  6481  
  6482  @quotation
  6483  @multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
  6484  @item @nicode{c}            @tab character or characters
  6485  @item @nicode{d}            @tab decimal integer
  6486  @item @nicode{e} @nicode{E} @nicode{f} @nicode{g} @nicode{G}
  6487                              @tab float
  6488  @item @nicode{i}            @tab integer with base indicator
  6489  @item @nicode{n}            @tab characters read so far
  6490  @item @nicode{o}            @tab octal integer
  6491  @item @nicode{p}            @tab pointer
  6492  @item @nicode{s}            @tab string of non-whitespace characters
  6493  @item @nicode{u}            @tab decimal integer
  6494  @item @nicode{x} @nicode{X} @tab hex integer
  6495  @item @nicode{[}            @tab string of characters in a set
  6496  @end multitable
  6497  @end quotation
  6498  
  6499  @samp{e}, @samp{E}, @samp{f}, @samp{g} and @samp{G} are identical, they all
  6500  read either fixed point or scientific format, and either upper or lower case
  6501  @samp{e} for the exponent in scientific format.
  6502  
  6503  C99 style hex float format (@code{printf %a}, @pxref{Formatted Output
  6504  Strings}) is always accepted for @code{mpf_t}, but for the standard float
  6505  types it will depend on the C library.
  6506  
  6507  @samp{x} and @samp{X} are identical, both accept both upper and lower case
  6508  hexadecimal.
  6509  
  6510  @samp{o}, @samp{u}, @samp{x} and @samp{X} all read positive or negative
  6511  values.  For the standard C types these are described as ``unsigned''
  6512  conversions, but that merely affects certain overflow handling, negatives are
  6513  still allowed (per @code{strtoul}, @pxref{Parsing of Integers,, Parsing of
  6514  Integers, libc, The GNU C Library Reference Manual}).  For GMP types there are
  6515  no overflows, so @samp{d} and @samp{u} are identical.
  6516  
  6517  @samp{Q} type reads the numerator and (optional) denominator as given.  If the
  6518  value might not be in canonical form then @code{mpq_canonicalize} must be
  6519  called before using it in any calculations (@pxref{Rational Number
  6520  Functions}).
  6521  
  6522  @samp{Qi} will read a base specification separately for the numerator and
  6523  denominator.  For example @samp{0x10/11} would be 16/11, whereas
  6524  @samp{0x10/0x11} would be 16/17.
  6525  
  6526  @samp{n} can be used with any of the types above, even the GMP types.
  6527  @samp{*} to suppress assignment is allowed, though in that case it would do
  6528  nothing at all.
  6529  
  6530  Other conversions or types that might be accepted by the C library
  6531  @code{scanf} cannot be used through @code{gmp_scanf}.
  6532  
  6533  Whitespace is read and discarded before a field, except for @samp{c} and
  6534  @samp{[} conversions.
  6535  
  6536  For float conversions, the decimal point character (or string) expected is
  6537  taken from the current locale settings on systems which provide
  6538  @code{localeconv} (@pxref{Locales,, Locales and Internationalization, libc,
  6539  The GNU C Library Reference Manual}).  The C library will normally do the same
  6540  for standard float input.
  6541  
  6542  The format string is only interpreted as plain @code{char}s, multibyte
  6543  characters are not recognised.  Perhaps this will change in the future.
  6544  
  6545  
  6546  @node Formatted Input Functions, C++ Formatted Input, Formatted Input Strings, Formatted Input
  6547  @section Formatted Input Functions
  6548  @cindex Input functions
  6549  
  6550  Each of the following functions is similar to the corresponding C library
  6551  function.  The plain @code{scanf} forms take a variable argument list.  The
  6552  @code{vscanf} forms take an argument pointer, see @ref{Variadic Functions,,
  6553  Variadic Functions, libc, The GNU C Library Reference Manual}, or @samp{man 3
  6554  va_start}.
  6555  
  6556  It should be emphasised that if a format string is invalid, or the arguments
  6557  don't match what the format specifies, then the behaviour of any of these
  6558  functions will be unpredictable.  GCC format string checking is not available,
  6559  since it doesn't recognise the GMP extensions.
  6560  
  6561  No overlap is permitted between the @var{fmt} string and any of the results
  6562  produced.
  6563  
  6564  @deftypefun int gmp_scanf (const char *@var{fmt}, @dots{})
  6565  @deftypefunx int gmp_vscanf (const char *@var{fmt}, va_list @var{ap})
  6566  Read from the standard input @code{stdin}.
  6567  @end deftypefun
  6568  
  6569  @deftypefun int gmp_fscanf (FILE *@var{fp}, const char *@var{fmt}, @dots{})
  6570  @deftypefunx int gmp_vfscanf (FILE *@var{fp}, const char *@var{fmt}, va_list @var{ap})
  6571  Read from the stream @var{fp}.
  6572  @end deftypefun
  6573  
  6574  @deftypefun int gmp_sscanf (const char *@var{s}, const char *@var{fmt}, @dots{})
  6575  @deftypefunx int gmp_vsscanf (const char *@var{s}, const char *@var{fmt}, va_list @var{ap})
  6576  Read from a null-terminated string @var{s}.
  6577  @end deftypefun
  6578  
  6579  The return value from each of these functions is the same as the standard C99
  6580  @code{scanf}, namely the number of fields successfully parsed and stored.
  6581  @samp{%n} fields and fields read but suppressed by @samp{*} don't count
  6582  towards the return value.
  6583  
  6584  If end of input (or a file error) is reached before a character for a field or
  6585  a literal, and if no previous non-suppressed fields have matched, then the
  6586  return value is @code{EOF} instead of 0.  A whitespace character in the format
  6587  string is only an optional match and doesn't induce an @code{EOF} in this
  6588  fashion.  Leading whitespace read and discarded for a field don't count as
  6589  characters for that field.
  6590  
  6591  For the GMP types, input parsing follows C99 rules, namely one character of
  6592  lookahead is used and characters are read while they continue to meet the
  6593  format requirements.  If this doesn't provide a complete number then the
  6594  function terminates, with that field not stored nor counted towards the return
  6595  value.  For instance with @code{mpf_t} an input @samp{1.23e-XYZ} would be read
  6596  up to the @samp{X} and that character pushed back since it's not a digit.  The
  6597  string @samp{1.23e-} would then be considered invalid since an @samp{e} must
  6598  be followed by at least one digit.
  6599  
  6600  For the standard C types, in the current implementation GMP calls the C
  6601  library @code{scanf} functions, which might have looser rules about what
  6602  constitutes a valid input.
  6603  
  6604  Note that @code{gmp_sscanf} is the same as @code{gmp_fscanf} and only does one
  6605  character of lookahead when parsing.  Although clearly it could look at its
  6606  entire input, it is deliberately made identical to @code{gmp_fscanf}, the same
  6607  way C99 @code{sscanf} is the same as @code{fscanf}.
  6608  
  6609  
  6610  @node C++ Formatted Input,  , Formatted Input Functions, Formatted Input
  6611  @section C++ Formatted Input
  6612  @cindex C++ @code{istream} input
  6613  @cindex @code{istream} input
  6614  
  6615  The following functions are provided in @file{libgmpxx} (@pxref{Headers and
  6616  Libraries}), which is built only if C++ support is enabled (@pxref{Build
  6617  Options}).  Prototypes are available from @code{<gmp.h>}.
  6618  
  6619  @deftypefun istream& operator>> (istream& @var{stream}, mpz_t @var{rop})
  6620  Read @var{rop} from @var{stream}, using its @code{ios} formatting settings.
  6621  @end deftypefun
  6622  
  6623  @deftypefun istream& operator>> (istream& @var{stream}, mpq_t @var{rop})
  6624  An integer like @samp{123} will be read, or a fraction like @samp{5/9}.  No
  6625  whitespace is allowed around the @samp{/}.  If the fraction is not in
  6626  canonical form then @code{mpq_canonicalize} must be called (@pxref{Rational
  6627  Number Functions}) before operating on it.
  6628  
  6629  As per integer input, an @samp{0} or @samp{0x} base indicator is read when
  6630  none of @code{ios::dec}, @code{ios::oct} or @code{ios::hex} are set.  This is
  6631  done separately for numerator and denominator, so that for instance
  6632  @samp{0x10/11} is @math{16/11} and @samp{0x10/0x11} is @math{16/17}.
  6633  @end deftypefun
  6634  
  6635  @deftypefun istream& operator>> (istream& @var{stream}, mpf_t @var{rop})
  6636  Read @var{rop} from @var{stream}, using its @code{ios} formatting settings.
  6637  
  6638  Hex or octal floats are not supported, but might be in the future, or perhaps
  6639  it's best to accept only what the standard float @code{operator>>} does.
  6640  @end deftypefun
  6641  
  6642  Note that digit grouping specified by the @code{istream} locale is currently
  6643  not accepted.  Perhaps this will change in the future.
  6644  
  6645  @sp 1
  6646  These operators mean that GMP types can be read in the usual C++ way, for
  6647  example,
  6648  
  6649  @example
  6650  mpz_t  z;
  6651  ...
  6652  cin >> z;
  6653  @end example
  6654  
  6655  But note that @code{istream} input (and @code{ostream} output, @pxref{C++
  6656  Formatted Output}) is the only overloading available for the GMP types and
  6657  that for instance using @code{+} with an @code{mpz_t} will have unpredictable
  6658  results.  For classes with overloading, see @ref{C++ Class Interface}.
  6659  
  6660  
  6661  
  6662  @node C++ Class Interface, Custom Allocation, Formatted Input, Top
  6663  @chapter C++ Class Interface
  6664  @cindex C++ interface
  6665  
  6666  This chapter describes the C++ class based interface to GMP.
  6667  
  6668  All GMP C language types and functions can be used in C++ programs, since
  6669  @file{gmp.h} has @code{extern "C"} qualifiers, but the class interface offers
  6670  overloaded functions and operators which may be more convenient.
  6671  
  6672  Due to the implementation of this interface, a reasonably recent C++ compiler
  6673  is required, one supporting namespaces, partial specialization of templates
  6674  and member templates.
  6675  
  6676  @strong{Everything described in this chapter is to be considered preliminary
  6677  and might be subject to incompatible changes if some unforeseen difficulty
  6678  reveals itself.}
  6679  
  6680  @menu
  6681  * C++ Interface General::
  6682  * C++ Interface Integers::
  6683  * C++ Interface Rationals::
  6684  * C++ Interface Floats::
  6685  * C++ Interface Random Numbers::
  6686  * C++ Interface Limitations::
  6687  @end menu
  6688  
  6689  
  6690  @node C++ Interface General, C++ Interface Integers, C++ Class Interface, C++ Class Interface
  6691  @section C++ Interface General
  6692  
  6693  @noindent
  6694  All the C++ classes and functions are available with
  6695  
  6696  @cindex @code{gmpxx.h}
  6697  @example
  6698  #include <gmpxx.h>
  6699  @end example
  6700  
  6701  Programs should be linked with the @file{libgmpxx} and @file{libgmp}
  6702  libraries.  For example,
  6703  
  6704  @example
  6705  g++ mycxxprog.cc -lgmpxx -lgmp
  6706  @end example
  6707  
  6708  @noindent
  6709  The classes defined are
  6710  
  6711  @deftp Class mpz_class
  6712  @deftpx Class mpq_class
  6713  @deftpx Class mpf_class
  6714  @end deftp
  6715  
  6716  The standard operators and various standard functions are overloaded to allow
  6717  arithmetic with these classes.  For example,
  6718  
  6719  @example
  6720  int
  6721  main (void)
  6722  @{
  6723    mpz_class a, b, c;
  6724  
  6725    a = 1234;
  6726    b = "-5678";
  6727    c = a+b;
  6728    cout << "sum is " << c << "\n";
  6729    cout << "absolute value is " << abs(c) << "\n";
  6730  
  6731    return 0;
  6732  @}
  6733  @end example
  6734  
  6735  An important feature of the implementation is that an expression like
  6736  @code{a=b+c} results in a single call to the corresponding @code{mpz_add},
  6737  without using a temporary for the @code{b+c} part.  Expressions which by their
  6738  nature imply intermediate values, like @code{a=b*c+d*e}, still use temporaries
  6739  though.
  6740  
  6741  The classes can be freely intermixed in expressions, as can the classes and
  6742  the standard types @code{long}, @code{unsigned long} and @code{double}.
  6743  Smaller types like @code{int} or @code{float} can also be intermixed, since
  6744  C++ will promote them.
  6745  
  6746  Note that @code{bool} is not accepted directly, but must be explicitly cast to
  6747  an @code{int} first.  This is because C++ will automatically convert any
  6748  pointer to a @code{bool}, so if GMP accepted @code{bool} it would make all
  6749  sorts of invalid class and pointer combinations compile but almost certainly
  6750  not do anything sensible.
  6751  
  6752  Conversions back from the classes to standard C++ types aren't done
  6753  automatically, instead member functions like @code{get_si} are provided (see
  6754  the following sections for details).
  6755  
  6756  Also there are no automatic conversions from the classes to the corresponding
  6757  GMP C types, instead a reference to the underlying C object can be obtained
  6758  with the following functions,
  6759  
  6760  @deftypefun mpz_t mpz_class::get_mpz_t ()
  6761  @deftypefunx mpq_t mpq_class::get_mpq_t ()
  6762  @deftypefunx mpf_t mpf_class::get_mpf_t ()
  6763  @end deftypefun
  6764  
  6765  These can be used to call a C function which doesn't have a C++ class
  6766  interface.  For example to set @code{a} to the GCD of @code{b} and @code{c},
  6767  
  6768  @example
  6769  mpz_class a, b, c;
  6770  ...
  6771  mpz_gcd (a.get_mpz_t(), b.get_mpz_t(), c.get_mpz_t());
  6772  @end example
  6773  
  6774  In the other direction, a class can be initialized from the corresponding GMP
  6775  C type, or assigned to if an explicit constructor is used.  In both cases this
  6776  makes a copy of the value, it doesn't create any sort of association.  For
  6777  example,
  6778  
  6779  @example
  6780  mpz_t z;
  6781  // ... init and calculate z ...
  6782  mpz_class x(z);
  6783  mpz_class y;
  6784  y = mpz_class (z);
  6785  @end example
  6786  
  6787  There are no namespace setups in @file{gmpxx.h}, all types and functions are
  6788  simply put into the global namespace.  This is what @file{gmp.h} has done in
  6789  the past, and continues to do for compatibility.  The extras provided by
  6790  @file{gmpxx.h} follow GMP naming conventions and are unlikely to clash with
  6791  anything.
  6792  
  6793  
  6794  @node C++ Interface Integers, C++ Interface Rationals, C++ Interface General, C++ Class Interface
  6795  @section C++ Interface Integers
  6796  
  6797  @deftypefun {} mpz_class::mpz_class (type @var{n})
  6798  Construct an @code{mpz_class}.  All the standard C++ types may be used, except
  6799  @code{long long} and @code{long double}, and all the GMP C++ classes can be
  6800  used, although conversions from @code{mpq_class} and @code{mpf_class} are
  6801  @code{explicit}.  Any necessary conversion follows the corresponding C
  6802  function, for example @code{double} follows @code{mpz_set_d}
  6803  (@pxref{Assigning Integers}).
  6804  @end deftypefun
  6805  
  6806  @deftypefun explicit mpz_class::mpz_class (const mpz_t @var{z})
  6807  Construct an @code{mpz_class} from an @code{mpz_t}.  The value in @var{z} is
  6808  copied into the new @code{mpz_class}, there won't be any permanent association
  6809  between it and @var{z}.
  6810  @end deftypefun
  6811  
  6812  @deftypefun explicit mpz_class::mpz_class (const char *@var{s}, int @var{base} = 0)
  6813  @deftypefunx explicit mpz_class::mpz_class (const string& @var{s}, int @var{base} = 0)
  6814  Construct an @code{mpz_class} converted from a string using @code{mpz_set_str}
  6815  (@pxref{Assigning Integers}).
  6816  
  6817  If the string is not a valid integer, an @code{std::invalid_argument}
  6818  exception is thrown.  The same applies to @code{operator=}.
  6819  @end deftypefun
  6820  
  6821  @deftypefun mpz_class operator"" _mpz (const char *@var{str})
  6822  With C++11 compilers, integers can be constructed with the syntax
  6823  @code{123_mpz} which is equivalent to @code{mpz_class("123")}.
  6824  @end deftypefun
  6825  
  6826  @deftypefun mpz_class operator/ (mpz_class @var{a}, mpz_class @var{d})
  6827  @deftypefunx mpz_class operator% (mpz_class @var{a}, mpz_class @var{d})
  6828  Divisions involving @code{mpz_class} round towards zero, as per the
  6829  @code{mpz_tdiv_q} and @code{mpz_tdiv_r} functions (@pxref{Integer Division}).
  6830  This is the same as the C99 @code{/} and @code{%} operators.
  6831  
  6832  The @code{mpz_fdiv@dots{}} or @code{mpz_cdiv@dots{}} functions can always be called
  6833  directly if desired.  For example,
  6834  
  6835  @example
  6836  mpz_class q, a, d;
  6837  ...
  6838  mpz_fdiv_q (q.get_mpz_t(), a.get_mpz_t(), d.get_mpz_t());
  6839  @end example
  6840  @end deftypefun
  6841  
  6842  @deftypefun mpz_class abs (mpz_class @var{op})
  6843  @deftypefunx int cmp (mpz_class @var{op1}, type @var{op2})
  6844  @deftypefunx int cmp (type @var{op1}, mpz_class @var{op2})
  6845  @maybepagebreak
  6846  @deftypefunx bool mpz_class::fits_sint_p (void)
  6847  @deftypefunx bool mpz_class::fits_slong_p (void)
  6848  @deftypefunx bool mpz_class::fits_sshort_p (void)
  6849  @maybepagebreak
  6850  @deftypefunx bool mpz_class::fits_uint_p (void)
  6851  @deftypefunx bool mpz_class::fits_ulong_p (void)
  6852  @deftypefunx bool mpz_class::fits_ushort_p (void)
  6853  @maybepagebreak
  6854  @deftypefunx double mpz_class::get_d (void)
  6855  @deftypefunx long mpz_class::get_si (void)
  6856  @deftypefunx string mpz_class::get_str (int @var{base} = 10)
  6857  @deftypefunx {unsigned long} mpz_class::get_ui (void)
  6858  @maybepagebreak
  6859  @deftypefunx int mpz_class::set_str (const char *@var{str}, int @var{base})
  6860  @deftypefunx int mpz_class::set_str (const string& @var{str}, int @var{base})
  6861  @deftypefunx int sgn (mpz_class @var{op})
  6862  @deftypefunx mpz_class sqrt (mpz_class @var{op})
  6863  @maybepagebreak
  6864  @deftypefunx mpz_class gcd (mpz_class @var{op1}, mpz_class @var{op2})
  6865  @deftypefunx mpz_class lcm (mpz_class @var{op1}, mpz_class @var{op2})
  6866  @maybepagebreak
  6867  @deftypefunx void mpz_class::swap (mpz_class& @var{op})
  6868  @deftypefunx void swap (mpz_class& @var{op1}, mpz_class& @var{op2})
  6869  These functions provide a C++ class interface to the corresponding GMP C
  6870  routines.
  6871  
  6872  @code{cmp} can be used with any of the classes or the standard C++ types,
  6873  except @code{long long} and @code{long double}.
  6874  @end deftypefun
  6875  
  6876  @sp 1
  6877  Overloaded operators for combinations of @code{mpz_class} and @code{double}
  6878  are provided for completeness, but it should be noted that if the given
  6879  @code{double} is not an integer then the way any rounding is done is currently
  6880  unspecified.  The rounding might take place at the start, in the middle, or at
  6881  the end of the operation, and it might change in the future.
  6882  
  6883  Conversions between @code{mpz_class} and @code{double}, however, are defined
  6884  to follow the corresponding C functions @code{mpz_get_d} and @code{mpz_set_d}.
  6885  And comparisons are always made exactly, as per @code{mpz_cmp_d}.
  6886  
  6887  
  6888  @node C++ Interface Rationals, C++ Interface Floats, C++ Interface Integers, C++ Class Interface
  6889  @section C++ Interface Rationals
  6890  
  6891  In all the following constructors, if a fraction is given then it should be in
  6892  canonical form, or if not then @code{mpq_class::canonicalize} called.
  6893  
  6894  @deftypefun {} mpq_class::mpq_class (type @var{op})
  6895  @deftypefunx {} mpq_class::mpq_class (integer @var{num}, integer @var{den})
  6896  Construct an @code{mpq_class}.  The initial value can be a single value of any
  6897  type (conversion from @code{mpf_class} is @code{explicit}), or a pair of
  6898  integers (@code{mpz_class} or standard C++ integer types) representing a
  6899  fraction, except that @code{long long} and @code{long double} are not
  6900  supported.  For example,
  6901  
  6902  @example
  6903  mpq_class q (99);
  6904  mpq_class q (1.75);
  6905  mpq_class q (1, 3);
  6906  @end example
  6907  @end deftypefun
  6908  
  6909  @deftypefun explicit mpq_class::mpq_class (const mpq_t @var{q})
  6910  Construct an @code{mpq_class} from an @code{mpq_t}.  The value in @var{q} is
  6911  copied into the new @code{mpq_class}, there won't be any permanent association
  6912  between it and @var{q}.
  6913  @end deftypefun
  6914  
  6915  @deftypefun explicit mpq_class::mpq_class (const char *@var{s}, int @var{base} = 0)
  6916  @deftypefunx explicit mpq_class::mpq_class (const string& @var{s}, int @var{base} = 0)
  6917  Construct an @code{mpq_class} converted from a string using @code{mpq_set_str}
  6918  (@pxref{Initializing Rationals}).
  6919  
  6920  If the string is not a valid rational, an @code{std::invalid_argument}
  6921  exception is thrown.  The same applies to @code{operator=}.
  6922  @end deftypefun
  6923  
  6924  @deftypefun mpq_class operator"" _mpq (const char *@var{str})
  6925  With C++11 compilers, integral rationals can be constructed with the syntax
  6926  @code{123_mpq} which is equivalent to @code{mpq_class(123_mpz)}. Other
  6927  rationals can be built as @code{-1_mpq/2} or @code{0xb_mpq/123456_mpz}.
  6928  @end deftypefun
  6929  
  6930  @deftypefun void mpq_class::canonicalize ()
  6931  Put an @code{mpq_class} into canonical form, as per @ref{Rational Number
  6932  Functions}.  All arithmetic operators require their operands in canonical
  6933  form, and will return results in canonical form.
  6934  @end deftypefun
  6935  
  6936  @deftypefun mpq_class abs (mpq_class @var{op})
  6937  @deftypefunx int cmp (mpq_class @var{op1}, type @var{op2})
  6938  @deftypefunx int cmp (type @var{op1}, mpq_class @var{op2})
  6939  @maybepagebreak
  6940  @deftypefunx double mpq_class::get_d (void)
  6941  @deftypefunx string mpq_class::get_str (int @var{base} = 10)
  6942  @maybepagebreak
  6943  @deftypefunx int mpq_class::set_str (const char *@var{str}, int @var{base})
  6944  @deftypefunx int mpq_class::set_str (const string& @var{str}, int @var{base})
  6945  @deftypefunx int sgn (mpq_class @var{op})
  6946  @maybepagebreak
  6947  @deftypefunx void mpq_class::swap (mpq_class& @var{op})
  6948  @deftypefunx void swap (mpq_class& @var{op1}, mpq_class& @var{op2})
  6949  These functions provide a C++ class interface to the corresponding GMP C
  6950  routines.
  6951  
  6952  @code{cmp} can be used with any of the classes or the standard C++ types,
  6953  except @code{long long} and @code{long double}.
  6954  @end deftypefun
  6955  
  6956  @deftypefun {mpz_class&} mpq_class::get_num ()
  6957  @deftypefunx {mpz_class&} mpq_class::get_den ()
  6958  Get a reference to an @code{mpz_class} which is the numerator or denominator
  6959  of an @code{mpq_class}.  This can be used both for read and write access.  If
  6960  the object returned is modified, it modifies the original @code{mpq_class}.
  6961  
  6962  If direct manipulation might produce a non-canonical value, then
  6963  @code{mpq_class::canonicalize} must be called before further operations.
  6964  @end deftypefun
  6965  
  6966  @deftypefun mpz_t mpq_class::get_num_mpz_t ()
  6967  @deftypefunx mpz_t mpq_class::get_den_mpz_t ()
  6968  Get a reference to the underlying @code{mpz_t} numerator or denominator of an
  6969  @code{mpq_class}.  This can be passed to C functions expecting an
  6970  @code{mpz_t}.  Any modifications made to the @code{mpz_t} will modify the
  6971  original @code{mpq_class}.
  6972  
  6973  If direct manipulation might produce a non-canonical value, then
  6974  @code{mpq_class::canonicalize} must be called before further operations.
  6975  @end deftypefun
  6976  
  6977  @deftypefun istream& operator>> (istream& @var{stream}, mpq_class& @var{rop});
  6978  Read @var{rop} from @var{stream}, using its @code{ios} formatting settings,
  6979  the same as @code{mpq_t operator>>} (@pxref{C++ Formatted Input}).
  6980  
  6981  If the @var{rop} read might not be in canonical form then
  6982  @code{mpq_class::canonicalize} must be called.
  6983  @end deftypefun
  6984  
  6985  
  6986  @node C++ Interface Floats, C++ Interface Random Numbers, C++ Interface Rationals, C++ Class Interface
  6987  @section C++ Interface Floats
  6988  
  6989  When an expression requires the use of temporary intermediate @code{mpf_class}
  6990  values, like @code{f=g*h+x*y}, those temporaries will have the same precision
  6991  as the destination @code{f}.  Explicit constructors can be used if this
  6992  doesn't suit.
  6993  
  6994  @deftypefun {} mpf_class::mpf_class (type @var{op})
  6995  @deftypefunx {} mpf_class::mpf_class (type @var{op}, mp_bitcnt_t @var{prec})
  6996  Construct an @code{mpf_class}.  Any standard C++ type can be used, except
  6997  @code{long long} and @code{long double}, and any of the GMP C++ classes can be
  6998  used.
  6999  
  7000  If @var{prec} is given, the initial precision is that value, in bits.  If
  7001  @var{prec} is not given, then the initial precision is determined by the type
  7002  of @var{op} given.  An @code{mpz_class}, @code{mpq_class}, or C++
  7003  builtin type will give the default @code{mpf} precision (@pxref{Initializing
  7004  Floats}).  An @code{mpf_class} or expression will give the precision of that
  7005  value.  The precision of a binary expression is the higher of the two
  7006  operands.
  7007  
  7008  @example
  7009  mpf_class f(1.5);        // default precision
  7010  mpf_class f(1.5, 500);   // 500 bits (at least)
  7011  mpf_class f(x);          // precision of x
  7012  mpf_class f(abs(x));     // precision of x
  7013  mpf_class f(-g, 1000);   // 1000 bits (at least)
  7014  mpf_class f(x+y);        // greater of precisions of x and y
  7015  @end example
  7016  @end deftypefun
  7017  
  7018  @deftypefun explicit mpf_class::mpf_class (const mpf_t @var{f})
  7019  @deftypefunx {} mpf_class::mpf_class (const mpf_t @var{f}, mp_bitcnt_t @var{prec})
  7020  Construct an @code{mpf_class} from an @code{mpf_t}.  The value in @var{f} is
  7021  copied into the new @code{mpf_class}, there won't be any permanent association
  7022  between it and @var{f}.
  7023  
  7024  If @var{prec} is given, the initial precision is that value, in bits.  If
  7025  @var{prec} is not given, then the initial precision is that of @var{f}.
  7026  @end deftypefun
  7027  
  7028  @deftypefun explicit mpf_class::mpf_class (const char *@var{s})
  7029  @deftypefunx {} mpf_class::mpf_class (const char *@var{s}, mp_bitcnt_t @var{prec}, int @var{base} = 0)
  7030  @deftypefunx explicit mpf_class::mpf_class (const string& @var{s})
  7031  @deftypefunx {} mpf_class::mpf_class (const string& @var{s}, mp_bitcnt_t @var{prec}, int @var{base} = 0)
  7032  Construct an @code{mpf_class} converted from a string using @code{mpf_set_str}
  7033  (@pxref{Assigning Floats}).  If @var{prec} is given, the initial precision is
  7034  that value, in bits.  If not, the default @code{mpf} precision
  7035  (@pxref{Initializing Floats}) is used.
  7036  
  7037  If the string is not a valid float, an @code{std::invalid_argument} exception
  7038  is thrown.  The same applies to @code{operator=}.
  7039  @end deftypefun
  7040  
  7041  @deftypefun mpf_class operator"" _mpf (const char *@var{str})
  7042  With C++11 compilers, floats can be constructed with the syntax
  7043  @code{1.23e-1_mpf} which is equivalent to @code{mpf_class("1.23e-1")}.
  7044  @end deftypefun
  7045  
  7046  @deftypefun {mpf_class&} mpf_class::operator= (type @var{op})
  7047  Convert and store the given @var{op} value to an @code{mpf_class} object.  The
  7048  same types are accepted as for the constructors above.
  7049  
  7050  Note that @code{operator=} only stores a new value, it doesn't copy or change
  7051  the precision of the destination, instead the value is truncated if necessary.
  7052  This is the same as @code{mpf_set} etc.  Note in particular this means for
  7053  @code{mpf_class} a copy constructor is not the same as a default constructor
  7054  plus assignment.
  7055  
  7056  @example
  7057  mpf_class x (y);   // x created with precision of y
  7058  
  7059  mpf_class x;       // x created with default precision
  7060  x = y;             // value truncated to that precision
  7061  @end example
  7062  
  7063  Applications using templated code may need to be careful about the assumptions
  7064  the code makes in this area, when working with @code{mpf_class} values of
  7065  various different or non-default precisions.  For instance implementations of
  7066  the standard @code{complex} template have been seen in both styles above,
  7067  though of course @code{complex} is normally only actually specified for use
  7068  with the builtin float types.
  7069  @end deftypefun
  7070  
  7071  @deftypefun mpf_class abs (mpf_class @var{op})
  7072  @deftypefunx mpf_class ceil (mpf_class @var{op})
  7073  @deftypefunx int cmp (mpf_class @var{op1}, type @var{op2})
  7074  @deftypefunx int cmp (type @var{op1}, mpf_class @var{op2})
  7075  @maybepagebreak
  7076  @deftypefunx bool mpf_class::fits_sint_p (void)
  7077  @deftypefunx bool mpf_class::fits_slong_p (void)
  7078  @deftypefunx bool mpf_class::fits_sshort_p (void)
  7079  @maybepagebreak
  7080  @deftypefunx bool mpf_class::fits_uint_p (void)
  7081  @deftypefunx bool mpf_class::fits_ulong_p (void)
  7082  @deftypefunx bool mpf_class::fits_ushort_p (void)
  7083  @maybepagebreak
  7084  @deftypefunx mpf_class floor (mpf_class @var{op})
  7085  @deftypefunx mpf_class hypot (mpf_class @var{op1}, mpf_class @var{op2})
  7086  @maybepagebreak
  7087  @deftypefunx double mpf_class::get_d (void)
  7088  @deftypefunx long mpf_class::get_si (void)
  7089  @deftypefunx string mpf_class::get_str (mp_exp_t& @var{exp}, int @var{base} = 10, size_t @var{digits} = 0)
  7090  @deftypefunx {unsigned long} mpf_class::get_ui (void)
  7091  @maybepagebreak
  7092  @deftypefunx int mpf_class::set_str (const char *@var{str}, int @var{base})
  7093  @deftypefunx int mpf_class::set_str (const string& @var{str}, int @var{base})
  7094  @deftypefunx int sgn (mpf_class @var{op})
  7095  @deftypefunx mpf_class sqrt (mpf_class @var{op})
  7096  @maybepagebreak
  7097  @deftypefunx void mpf_class::swap (mpf_class& @var{op})
  7098  @deftypefunx void swap (mpf_class& @var{op1}, mpf_class& @var{op2})
  7099  @deftypefunx mpf_class trunc (mpf_class @var{op})
  7100  These functions provide a C++ class interface to the corresponding GMP C
  7101  routines.
  7102  
  7103  @code{cmp} can be used with any of the classes or the standard C++ types,
  7104  except @code{long long} and @code{long double}.
  7105  
  7106  The accuracy provided by @code{hypot} is not currently guaranteed.
  7107  @end deftypefun
  7108  
  7109  @deftypefun {mp_bitcnt_t} mpf_class::get_prec ()
  7110  @deftypefunx void mpf_class::set_prec (mp_bitcnt_t @var{prec})
  7111  @deftypefunx void mpf_class::set_prec_raw (mp_bitcnt_t @var{prec})
  7112  Get or set the current precision of an @code{mpf_class}.
  7113  
  7114  The restrictions described for @code{mpf_set_prec_raw} (@pxref{Initializing
  7115  Floats}) apply to @code{mpf_class::set_prec_raw}.  Note in particular that the
  7116  @code{mpf_class} must be restored to it's allocated precision before being
  7117  destroyed.  This must be done by application code, there's no automatic
  7118  mechanism for it.
  7119  @end deftypefun
  7120  
  7121  
  7122  @node C++ Interface Random Numbers, C++ Interface Limitations, C++ Interface Floats, C++ Class Interface
  7123  @section C++ Interface Random Numbers
  7124  
  7125  @deftp Class gmp_randclass
  7126  The C++ class interface to the GMP random number functions uses
  7127  @code{gmp_randclass} to hold an algorithm selection and current state, as per
  7128  @code{gmp_randstate_t}.
  7129  @end deftp
  7130  
  7131  @deftypefun {} gmp_randclass::gmp_randclass (void (*@var{randinit}) (gmp_randstate_t, @dots{}), @dots{})
  7132  Construct a @code{gmp_randclass}, using a call to the given @var{randinit}
  7133  function (@pxref{Random State Initialization}).  The arguments expected are
  7134  the same as @var{randinit}, but with @code{mpz_class} instead of @code{mpz_t}.
  7135  For example,
  7136  
  7137  @example
  7138  gmp_randclass r1 (gmp_randinit_default);
  7139  gmp_randclass r2 (gmp_randinit_lc_2exp_size, 32);
  7140  gmp_randclass r3 (gmp_randinit_lc_2exp, a, c, m2exp);
  7141  gmp_randclass r4 (gmp_randinit_mt);
  7142  @end example
  7143  
  7144  @code{gmp_randinit_lc_2exp_size} will fail if the size requested is too big,
  7145  an @code{std::length_error} exception is thrown in that case.
  7146  @end deftypefun
  7147  
  7148  @deftypefun {} gmp_randclass::gmp_randclass (gmp_randalg_t @var{alg}, @dots{})
  7149  Construct a @code{gmp_randclass} using the same parameters as
  7150  @code{gmp_randinit} (@pxref{Random State Initialization}).  This function is
  7151  obsolete and the above @var{randinit} style should be preferred.
  7152  @end deftypefun
  7153  
  7154  @deftypefun void gmp_randclass::seed (unsigned long int @var{s})
  7155  @deftypefunx void gmp_randclass::seed (mpz_class @var{s})
  7156  Seed a random number generator.  See @pxref{Random Number Functions}, for how
  7157  to choose a good seed.
  7158  @end deftypefun
  7159  
  7160  @deftypefun mpz_class gmp_randclass::get_z_bits (mp_bitcnt_t @var{bits})
  7161  @deftypefunx mpz_class gmp_randclass::get_z_bits (mpz_class @var{bits})
  7162  Generate a random integer with a specified number of bits.
  7163  @end deftypefun
  7164  
  7165  @deftypefun mpz_class gmp_randclass::get_z_range (mpz_class @var{n})
  7166  Generate a random integer in the range 0 to @math{@var{n}-1} inclusive.
  7167  @end deftypefun
  7168  
  7169  @deftypefun mpf_class gmp_randclass::get_f ()
  7170  @deftypefunx mpf_class gmp_randclass::get_f (mp_bitcnt_t @var{prec})
  7171  Generate a random float @var{f} in the range @math{0 <= @var{f} < 1}.  @var{f}
  7172  will be to @var{prec} bits precision, or if @var{prec} is not given then to
  7173  the precision of the destination.  For example,
  7174  
  7175  @example
  7176  gmp_randclass  r;
  7177  ...
  7178  mpf_class  f (0, 512);   // 512 bits precision
  7179  f = r.get_f();           // random number, 512 bits
  7180  @end example
  7181  @end deftypefun
  7182  
  7183  
  7184  
  7185  @node C++ Interface Limitations,  , C++ Interface Random Numbers, C++ Class Interface
  7186  @section C++ Interface Limitations
  7187  
  7188  @table @asis
  7189  @item @code{mpq_class} and Templated Reading
  7190  A generic piece of template code probably won't know that @code{mpq_class}
  7191  requires a @code{canonicalize} call if inputs read with @code{operator>>}
  7192  might be non-canonical.  This can lead to incorrect results.
  7193  
  7194  @code{operator>>} behaves as it does for reasons of efficiency.  A
  7195  canonicalize can be quite time consuming on large operands, and is best
  7196  avoided if it's not necessary.
  7197  
  7198  But this potential difficulty reduces the usefulness of @code{mpq_class}.
  7199  Perhaps a mechanism to tell @code{operator>>} what to do will be adopted in
  7200  the future, maybe a preprocessor define, a global flag, or an @code{ios} flag
  7201  pressed into service.  Or maybe, at the risk of inconsistency, the
  7202  @code{mpq_class} @code{operator>>} could canonicalize and leave @code{mpq_t}
  7203  @code{operator>>} not doing so, for use on those occasions when that's
  7204  acceptable.  Send feedback or alternate ideas to @email{gmp-bugs@@gmplib.org}.
  7205  
  7206  @item Subclassing
  7207  Subclassing the GMP C++ classes works, but is not currently recommended.
  7208  
  7209  Expressions involving subclasses resolve correctly (or seem to), but in normal
  7210  C++ fashion the subclass doesn't inherit constructors and assignments.
  7211  There's many of those in the GMP classes, and a good way to reestablish them
  7212  in a subclass is not yet provided.
  7213  
  7214  @item Templated Expressions
  7215  A subtle difficulty exists when using expressions together with
  7216  application-defined template functions.  Consider the following, with @code{T}
  7217  intended to be some numeric type,
  7218  
  7219  @example
  7220  template <class T>
  7221  T fun (const T &, const T &);
  7222  @end example
  7223  
  7224  @noindent
  7225  When used with, say, plain @code{mpz_class} variables, it works fine: @code{T}
  7226  is resolved as @code{mpz_class}.
  7227  
  7228  @example
  7229  mpz_class f(1), g(2);
  7230  fun (f, g);    // Good
  7231  @end example
  7232  
  7233  @noindent
  7234  But when one of the arguments is an expression, it doesn't work.
  7235  
  7236  @example
  7237  mpz_class f(1), g(2), h(3);
  7238  fun (f, g+h);  // Bad
  7239  @end example
  7240  
  7241  This is because @code{g+h} ends up being a certain expression template type
  7242  internal to @code{gmpxx.h}, which the C++ template resolution rules are unable
  7243  to automatically convert to @code{mpz_class}.  The workaround is simply to add
  7244  an explicit cast.
  7245  
  7246  @example
  7247  mpz_class f(1), g(2), h(3);
  7248  fun (f, mpz_class(g+h));  // Good
  7249  @end example
  7250  
  7251  Similarly, within @code{fun} it may be necessary to cast an expression to type
  7252  @code{T} when calling a templated @code{fun2}.
  7253  
  7254  @example
  7255  template <class T>
  7256  void fun (T f, T g)
  7257  @{
  7258    fun2 (f, f+g);     // Bad
  7259  @}
  7260  
  7261  template <class T>
  7262  void fun (T f, T g)
  7263  @{
  7264    fun2 (f, T(f+g));  // Good
  7265  @}
  7266  @end example
  7267  
  7268  @item C++11
  7269  C++11 provides several new ways in which types can be inferred: @code{auto},
  7270  @code{decltype}, etc. While they can be very convenient, they don't mix well
  7271  with expression templates. In this example, the addition is performed twice,
  7272  as if we had defined @code{sum} as a macro.
  7273  
  7274  @example
  7275  mpz_class z = 33;
  7276  auto sum = z + z;
  7277  mpz_class prod = sum * sum;
  7278  @end example
  7279  
  7280  This other example may crash, though some compilers might make it look like
  7281  it is working, because the expression @code{z+z} goes out of scope before it
  7282  is evaluated.
  7283  
  7284  @example
  7285  mpz_class z = 33;
  7286  auto sum = z + z + z;
  7287  mpz_class prod = sum * 2;
  7288  @end example
  7289  
  7290  It is thus strongly recommended to avoid @code{auto} anywhere a GMP C++
  7291  expression may appear.
  7292  @end table
  7293  
  7294  
  7295  @node Custom Allocation, Language Bindings, C++ Class Interface, Top
  7296  @comment  node-name,  next,  previous,  up
  7297  @chapter Custom Allocation
  7298  @cindex Custom allocation
  7299  @cindex Memory allocation
  7300  @cindex Allocation of memory
  7301  
  7302  By default GMP uses @code{malloc}, @code{realloc} and @code{free} for memory
  7303  allocation, and if they fail GMP prints a message to the standard error output
  7304  and terminates the program.
  7305  
  7306  Alternate functions can be specified, to allocate memory in a different way or
  7307  to have a different error action on running out of memory.
  7308  
  7309  @deftypefun void mp_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t, size_t), @* void (*@var{free_func_ptr}) (void *, size_t))
  7310  Replace the current allocation functions from the arguments.  If an argument
  7311  is @code{NULL}, the corresponding default function is used.
  7312  
  7313  These functions will be used for all memory allocation done by GMP, apart from
  7314  temporary space from @code{alloca} if that function is available and GMP is
  7315  configured to use it (@pxref{Build Options}).
  7316  
  7317  @strong{Be sure to call @code{mp_set_memory_functions} only when there are no
  7318  active GMP objects allocated using the previous memory functions!  Usually
  7319  that means calling it before any other GMP function.}
  7320  @end deftypefun
  7321  
  7322  The functions supplied should fit the following declarations:
  7323  
  7324  @deftypevr Function {void *} allocate_function (size_t @var{alloc_size})
  7325  Return a pointer to newly allocated space with at least @var{alloc_size}
  7326  bytes.
  7327  @end deftypevr
  7328  
  7329  @deftypevr Function {void *} reallocate_function (void *@var{ptr}, size_t @var{old_size}, size_t @var{new_size})
  7330  Resize a previously allocated block @var{ptr} of @var{old_size} bytes to be
  7331  @var{new_size} bytes.
  7332  
  7333  The block may be moved if necessary or if desired, and in that case the
  7334  smaller of @var{old_size} and @var{new_size} bytes must be copied to the new
  7335  location.  The return value is a pointer to the resized block, that being the
  7336  new location if moved or just @var{ptr} if not.
  7337  
  7338  @var{ptr} is never @code{NULL}, it's always a previously allocated block.
  7339  @var{new_size} may be bigger or smaller than @var{old_size}.
  7340  @end deftypevr
  7341  
  7342  @deftypevr Function void free_function (void *@var{ptr}, size_t @var{size})
  7343  De-allocate the space pointed to by @var{ptr}.
  7344  
  7345  @var{ptr} is never @code{NULL}, it's always a previously allocated block of
  7346  @var{size} bytes.
  7347  @end deftypevr
  7348  
  7349  A @dfn{byte} here means the unit used by the @code{sizeof} operator.
  7350  
  7351  The @var{reallocate_function} parameter @var{old_size} and the
  7352  @var{free_function} parameter @var{size} are passed for convenience, but of
  7353  course they can be ignored if not needed by an implementation.  The default
  7354  functions using @code{malloc} and friends for instance don't use them.
  7355  
  7356  No error return is allowed from any of these functions, if they return then
  7357  they must have performed the specified operation.  In particular note that
  7358  @var{allocate_function} or @var{reallocate_function} mustn't return
  7359  @code{NULL}.
  7360  
  7361  Getting a different fatal error action is a good use for custom allocation
  7362  functions, for example giving a graphical dialog rather than the default print
  7363  to @code{stderr}.  How much is possible when genuinely out of memory is
  7364  another question though.
  7365  
  7366  There's currently no defined way for the allocation functions to recover from
  7367  an error such as out of memory, they must terminate program execution.  A
  7368  @code{longjmp} or throwing a C++ exception will have undefined results.  This
  7369  may change in the future.
  7370  
  7371  GMP may use allocated blocks to hold pointers to other allocated blocks.  This
  7372  will limit the assumptions a conservative garbage collection scheme can make.
  7373  
  7374  Since the default GMP allocation uses @code{malloc} and friends, those
  7375  functions will be linked in even if the first thing a program does is an
  7376  @code{mp_set_memory_functions}.  It's necessary to change the GMP sources if
  7377  this is a problem.
  7378  
  7379  @sp 1
  7380  @deftypefun void mp_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t, size_t), @* void (**@var{free_func_ptr}) (void *, size_t))
  7381  Get the current allocation functions, storing function pointers to the
  7382  locations given by the arguments.  If an argument is @code{NULL}, that
  7383  function pointer is not stored.
  7384  
  7385  @need 1000
  7386  For example, to get just the current free function,
  7387  
  7388  @example
  7389  void (*freefunc) (void *, size_t);
  7390  
  7391  mp_get_memory_functions (NULL, NULL, &freefunc);
  7392  @end example
  7393  @end deftypefun
  7394  
  7395  @node Language Bindings, Algorithms, Custom Allocation, Top
  7396  @chapter Language Bindings
  7397  @cindex Language bindings
  7398  @cindex Other languages
  7399  
  7400  The following packages and projects offer access to GMP from languages other
  7401  than C, though perhaps with varying levels of functionality and efficiency.
  7402  
  7403  @c  @spaceuref{U} is the same as @uref{U}, but with a couple of extra spaces
  7404  @c  in tex, just to separate the URL from the preceding text a bit.
  7405  @iftex
  7406  @macro spaceuref {U}
  7407  @ @ @uref{\U\}
  7408  @end macro
  7409  @end iftex
  7410  @ifnottex
  7411  @macro spaceuref {U}
  7412  @uref{\U\}
  7413  @end macro
  7414  @end ifnottex
  7415  
  7416  @sp 1
  7417  @table @asis
  7418  @item C++
  7419  @itemize @bullet
  7420  @item
  7421  GMP C++ class interface, @pxref{C++ Class Interface} @* Straightforward
  7422  interface, expression templates to eliminate temporaries.
  7423  @item
  7424  ALP @spaceuref{https://www-sop.inria.fr/saga/logiciels/ALP/} @* Linear algebra and
  7425  polynomials using templates.
  7426  @item
  7427  Arithmos @spaceuref{http://cant.ua.ac.be/old/arithmos/} @* Rationals
  7428  with infinities and square roots.
  7429  @item
  7430  CLN @spaceuref{http://www.ginac.de/CLN/} @* High level classes for arithmetic.
  7431  @item
  7432  Linbox @spaceuref{http://www.linalg.org/} @* Sparse vectors and matrices.
  7433  @item
  7434  NTL @spaceuref{http://www.shoup.net/ntl/} @* A C++ number theory library.
  7435  @end itemize
  7436  
  7437  @c @item D
  7438  @c @itemize @bullet
  7439  @c @item
  7440  @c gmp-d @spaceuref{http://home.comcast.net/~benhinkle/gmp-d/}
  7441  @c @end itemize
  7442  
  7443  @item Eiffel
  7444  @itemize @bullet
  7445  @item
  7446  Eiffelroom @spaceuref{http://www.eiffelroom.org/node/442}
  7447  @end itemize
  7448  
  7449  @c @item Fortran
  7450  @c @itemize @bullet
  7451  @c @item
  7452  @c Omni F77 @spaceuref{http://phase.hpcc.jp/Omni/home.html} @* Arbitrary
  7453  @c precision floats.
  7454  @c @end itemize
  7455  
  7456  @item Haskell
  7457  @itemize @bullet
  7458  @item
  7459  Glasgow Haskell Compiler @spaceuref{https://www.haskell.org/ghc/}
  7460  @end itemize
  7461  
  7462  @item Java
  7463  @itemize @bullet
  7464  @item
  7465  Kaffe @spaceuref{https://github.com/kaffe/kaffe}
  7466  @end itemize
  7467  
  7468  @item Lisp
  7469  @itemize @bullet
  7470  @item
  7471  GNU Common Lisp @spaceuref{https://www.gnu.org/software/gcl/gcl.html}
  7472  @item
  7473  Librep @spaceuref{http://librep.sourceforge.net/}
  7474  @item
  7475  @c  FIXME: When there's a stable release with gmp support, just refer to it
  7476  @c  rather than bothering to talk about betas.
  7477  XEmacs (21.5.18 beta and up) @spaceuref{http://www.xemacs.org} @* Optional
  7478  big integers, rationals and floats using GMP.
  7479  @end itemize
  7480  
  7481  @item M4
  7482  @itemize @bullet
  7483  @item
  7484  @c  FIXME: When there's a stable release with gmp support, just refer to it
  7485  @c  rather than bothering to talk about betas.
  7486  GNU m4 betas @spaceuref{http://www.seindal.dk/rene/gnu/} @* Optionally provides
  7487  an arbitrary precision @code{mpeval}.
  7488  @end itemize
  7489  
  7490  @item ML
  7491  @itemize @bullet
  7492  @item
  7493  MLton compiler @spaceuref{http://mlton.org/}
  7494  @end itemize
  7495  
  7496  @item Objective Caml
  7497  @itemize @bullet
  7498  @item
  7499  MLGMP @spaceuref{http://opam.ocamlpro.com/pkg/mlgmp.20120224.html}
  7500  @item
  7501  Numerix @spaceuref{http://pauillac.inria.fr/~quercia/} @* Optionally using
  7502  GMP.
  7503  @end itemize
  7504  
  7505  @item Oz
  7506  @itemize @bullet
  7507  @item
  7508  Mozart @spaceuref{http://mozart.github.io/}
  7509  @end itemize
  7510  
  7511  @item Pascal
  7512  @itemize @bullet
  7513  @item
  7514  GNU Pascal Compiler @spaceuref{http://www.gnu-pascal.de/} @* GMP unit.
  7515  @item
  7516  Numerix @spaceuref{http://pauillac.inria.fr/~quercia/} @* For Free Pascal,
  7517  optionally using GMP.
  7518  @end itemize
  7519  
  7520  @item Perl
  7521  @itemize @bullet
  7522  @item
  7523  GMP module, see @file{demos/perl} in the GMP sources (@pxref{Demonstration
  7524  Programs}).
  7525  @item
  7526  Math::GMP @spaceuref{http://www.cpan.org/} @* Compatible with Math::BigInt, but
  7527  not as many functions as the GMP module above.
  7528  @item
  7529  Math::BigInt::GMP @spaceuref{http://www.cpan.org/} @* Plug Math::GMP into
  7530  normal Math::BigInt operations.
  7531  @end itemize
  7532  
  7533  @need 1000
  7534  @item Pike
  7535  @itemize @bullet
  7536  @item
  7537  mpz module in the standard distribution, @uref{http://pike.ida.liu.se/}
  7538  @end itemize
  7539  
  7540  @need 500
  7541  @item Prolog
  7542  @itemize @bullet
  7543  @item
  7544  SWI Prolog @spaceuref{http://www.swi-prolog.org/} @*
  7545  Arbitrary precision floats.
  7546  @end itemize
  7547  
  7548  @item Python
  7549  @itemize @bullet
  7550  @item
  7551  GMPY @uref{https://code.google.com/p/gmpy/}
  7552  @end itemize
  7553  
  7554  @item Ruby
  7555  @itemize @bullet
  7556  @item
  7557  http://rubygems.org/gems/gmp
  7558  @end itemize
  7559  
  7560  @item Scheme
  7561  @itemize @bullet
  7562  @item
  7563  GNU Guile @spaceuref{https://www.gnu.org/software/guile/guile.html}
  7564  @item
  7565  RScheme @spaceuref{http://www.rscheme.org/}
  7566  @item
  7567  STklos @spaceuref{http://www.stklos.net/}
  7568  @c
  7569  @c  For reference, MzScheme uses some of gmp, but (as of version 205) it only
  7570  @c  has copies of some of the generic C code, and we don't consider that a
  7571  @c  language binding to gmp.
  7572  @c
  7573  @end itemize
  7574  
  7575  @item Smalltalk
  7576  @itemize @bullet
  7577  @item
  7578  GNU Smalltalk @spaceuref{http://www.smalltalk.org/versions/GNUSmalltalk.html}
  7579  @end itemize
  7580  
  7581  @item Other
  7582  @itemize @bullet
  7583  @item
  7584  Axiom @uref{https://savannah.nongnu.org/projects/axiom} @* Computer algebra
  7585  using GCL.
  7586  @item
  7587  DrGenius @spaceuref{http://drgenius.seul.org/} @* Geometry system and
  7588  mathematical programming language.
  7589  @item
  7590  GiNaC @spaceuref{http://www.ginac.de/} @* C++ computer algebra using CLN.
  7591  @item
  7592  GOO @spaceuref{https://www.eecs.berkeley.edu/~jrb/goo/} @* Dynamic object oriented
  7593  language.
  7594  @item
  7595  Maxima @uref{https://www.ma.utexas.edu/users/wfs/maxima.html} @* Macsyma
  7596  computer algebra using GCL.
  7597  @c @item
  7598  @c Q @spaceuref{http://q-lang.sourceforge.net/} @* Equational programming system.
  7599  @item
  7600  Regina @spaceuref{http://regina.sourceforge.net/} @* Topological calculator.
  7601  @item
  7602  Yacas @spaceuref{http://yacas.sourceforge.net} @* Yet another computer algebra system.
  7603  @end itemize
  7604  
  7605  @end table
  7606  
  7607  
  7608  @node Algorithms, Internals, Language Bindings, Top
  7609  @chapter Algorithms
  7610  @cindex Algorithms
  7611  
  7612  This chapter is an introduction to some of the algorithms used for various GMP
  7613  operations.  The code is likely to be hard to understand without knowing
  7614  something about the algorithms.
  7615  
  7616  Some GMP internals are mentioned, but applications that expect to be
  7617  compatible with future GMP releases should take care to use only the
  7618  documented functions.
  7619  
  7620  @menu
  7621  * Multiplication Algorithms::
  7622  * Division Algorithms::
  7623  * Greatest Common Divisor Algorithms::
  7624  * Powering Algorithms::
  7625  * Root Extraction Algorithms::
  7626  * Radix Conversion Algorithms::
  7627  * Other Algorithms::
  7628  * Assembly Coding::
  7629  @end menu
  7630  
  7631  
  7632  @node Multiplication Algorithms, Division Algorithms, Algorithms, Algorithms
  7633  @section Multiplication
  7634  @cindex Multiplication algorithms
  7635  
  7636  N@cross{}N limb multiplications and squares are done using one of seven
  7637  algorithms, as the size N increases.
  7638  
  7639  @quotation
  7640  @multitable {KaratsubaMMM} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
  7641  @item Algorithm @tab Threshold
  7642  @item Basecase  @tab (none)
  7643  @item Karatsuba @tab @code{MUL_TOOM22_THRESHOLD}
  7644  @item Toom-3    @tab @code{MUL_TOOM33_THRESHOLD}
  7645  @item Toom-4    @tab @code{MUL_TOOM44_THRESHOLD}
  7646  @item Toom-6.5  @tab @code{MUL_TOOM6H_THRESHOLD}
  7647  @item Toom-8.5  @tab @code{MUL_TOOM8H_THRESHOLD}
  7648  @item FFT       @tab @code{MUL_FFT_THRESHOLD}
  7649  @end multitable
  7650  @end quotation
  7651  
  7652  Similarly for squaring, with the @code{SQR} thresholds.
  7653  
  7654  N@cross{}M multiplications of operands with different sizes above
  7655  @code{MUL_TOOM22_THRESHOLD} are currently done by special Toom-inspired
  7656  algorithms or directly with FFT, depending on operand size (@pxref{Unbalanced
  7657  Multiplication}).
  7658  
  7659  @menu
  7660  * Basecase Multiplication::
  7661  * Karatsuba Multiplication::
  7662  * Toom 3-Way Multiplication::
  7663  * Toom 4-Way Multiplication::
  7664  * Higher degree Toom'n'half::
  7665  * FFT Multiplication::
  7666  * Other Multiplication::
  7667  * Unbalanced Multiplication::
  7668  @end menu
  7669  
  7670  
  7671  @node Basecase Multiplication, Karatsuba Multiplication, Multiplication Algorithms, Multiplication Algorithms
  7672  @subsection Basecase Multiplication
  7673  
  7674  Basecase N@cross{}M multiplication is a straightforward rectangular set of
  7675  cross-products, the same as long multiplication done by hand and for that
  7676  reason sometimes known as the schoolbook or grammar school method.  This is an
  7677  @m{O(NM),O(N*M)} algorithm.  See Knuth section 4.3.1 algorithm M
  7678  (@pxref{References}), and the @file{mpn/generic/mul_basecase.c} code.
  7679  
  7680  Assembly implementations of @code{mpn_mul_basecase} are essentially the same
  7681  as the generic C code, but have all the usual assembly tricks and
  7682  obscurities introduced for speed.
  7683  
  7684  A square can be done in roughly half the time of a multiply, by using the fact
  7685  that the cross products above and below the diagonal are the same.  A triangle
  7686  of products below the diagonal is formed, doubled (left shift by one bit), and
  7687  then the products on the diagonal added.  This can be seen in
  7688  @file{mpn/generic/sqr_basecase.c}.  Again the assembly implementations take
  7689  essentially the same approach.
  7690  
  7691  @tex
  7692  \def\GMPline#1#2#3#4#5#6{%
  7693    \hbox {%
  7694      \vrule height 2.5ex depth 1ex
  7695             \hbox to 2em {\hfil{#2}\hfil}%
  7696      \vrule \hbox to 2em {\hfil{#3}\hfil}%
  7697      \vrule \hbox to 2em {\hfil{#4}\hfil}%
  7698      \vrule \hbox to 2em {\hfil{#5}\hfil}%
  7699      \vrule \hbox to 2em {\hfil{#6}\hfil}%
  7700      \vrule}}
  7701  \GMPdisplay{
  7702    \hbox{%
  7703      \vbox{%
  7704        \hbox to 1.5em {\vrule height 2.5ex depth 1ex width 0pt}%
  7705        \hbox {\vrule height 2.5ex depth 1ex width 0pt u0\hfil}%
  7706        \hbox {\vrule height 2.5ex depth 1ex width 0pt u1\hfil}%
  7707        \hbox {\vrule height 2.5ex depth 1ex width 0pt u2\hfil}%
  7708        \hbox {\vrule height 2.5ex depth 1ex width 0pt u3\hfil}%
  7709        \hbox {\vrule height 2.5ex depth 1ex width 0pt u4\hfil}%
  7710        \vfill}%
  7711      \vbox{%
  7712        \hbox{%
  7713          \hbox to 2em {\hfil u0\hfil}%
  7714          \hbox to 2em {\hfil u1\hfil}%
  7715          \hbox to 2em {\hfil u2\hfil}%
  7716          \hbox to 2em {\hfil u3\hfil}%
  7717          \hbox to 2em {\hfil u4\hfil}}%
  7718        \vskip 0.7ex
  7719        \hrule
  7720        \GMPline{u0}{d}{}{}{}{}%
  7721        \hrule
  7722        \GMPline{u1}{}{d}{}{}{}%
  7723        \hrule
  7724        \GMPline{u2}{}{}{d}{}{}%
  7725        \hrule
  7726        \GMPline{u3}{}{}{}{d}{}%
  7727        \hrule
  7728        \GMPline{u4}{}{}{}{}{d}%
  7729        \hrule}}}
  7730  @end tex
  7731  @ifnottex
  7732  @example
  7733  @group
  7734       u0  u1  u2  u3  u4
  7735     +---+---+---+---+---+
  7736  u0 | d |   |   |   |   |
  7737     +---+---+---+---+---+
  7738  u1 |   | d |   |   |   |
  7739     +---+---+---+---+---+
  7740  u2 |   |   | d |   |   |
  7741     +---+---+---+---+---+
  7742  u3 |   |   |   | d |   |
  7743     +---+---+---+---+---+
  7744  u4 |   |   |   |   | d |
  7745     +---+---+---+---+---+
  7746  @end group
  7747  @end example
  7748  @end ifnottex
  7749  
  7750  In practice squaring isn't a full 2@cross{} faster than multiplying, it's
  7751  usually around 1.5@cross{}.  Less than 1.5@cross{} probably indicates
  7752  @code{mpn_sqr_basecase} wants improving on that CPU.
  7753  
  7754  On some CPUs @code{mpn_mul_basecase} can be faster than the generic C
  7755  @code{mpn_sqr_basecase} on some small sizes.  @code{SQR_BASECASE_THRESHOLD} is
  7756  the size at which to use @code{mpn_sqr_basecase}, this will be zero if that
  7757  routine should be used always.
  7758  
  7759  
  7760  @node Karatsuba Multiplication, Toom 3-Way Multiplication, Basecase Multiplication, Multiplication Algorithms
  7761  @subsection Karatsuba Multiplication
  7762  @cindex Karatsuba multiplication
  7763  
  7764  The Karatsuba multiplication algorithm is described in Knuth section 4.3.3
  7765  part A, and various other textbooks.  A brief description is given here.
  7766  
  7767  The inputs @math{x} and @math{y} are treated as each split into two parts of
  7768  equal length (or the most significant part one limb shorter if N is odd).
  7769  
  7770  @tex
  7771  % GMPboxwidth used for all the multiplication pictures
  7772  \global\newdimen\GMPboxwidth \global\GMPboxwidth=5em
  7773  % GMPboxdepth and GMPboxheight are also used for the float pictures
  7774  \global\newdimen\GMPboxdepth  \global\GMPboxdepth=1ex
  7775  \global\newdimen\GMPboxheight \global\GMPboxheight=2ex
  7776  \gdef\GMPvrule{\vrule height \GMPboxheight depth \GMPboxdepth}
  7777  \def\GMPbox#1#2{%
  7778    \vbox {%
  7779      \hrule
  7780      \hbox to 2\GMPboxwidth{%
  7781        \GMPvrule \hfil $#1$\hfil \vrule \hfil $#2$\hfil \vrule}%
  7782      \hrule}}
  7783  \GMPdisplay{%
  7784  \vbox{%
  7785    \hbox to 2\GMPboxwidth {high \hfil low}
  7786    \vskip 0.7ex
  7787    \GMPbox{x_1}{x_0}
  7788    \vskip 0.5ex
  7789    \GMPbox{y_1}{y_0}
  7790  }}
  7791  @end tex
  7792  @ifnottex
  7793  @example
  7794  @group
  7795   high              low
  7796  +----------+----------+
  7797  |    x1    |    x0    |
  7798  +----------+----------+
  7799  
  7800  +----------+----------+
  7801  |    y1    |    y0    |
  7802  +----------+----------+
  7803  @end group
  7804  @end example
  7805  @end ifnottex
  7806  
  7807  Let @math{b} be the power of 2 where the split occurs, i.e.@: if @ms{x,0} is
  7808  @math{k} limbs (@ms{y,0} the same) then
  7809  @m{b=2\GMPraise{$k*$@code{mp\_bits\_per\_limb}}, b=2^(k*mp_bits_per_limb)}.
  7810  With that @m{x=x_1b+x_0,x=x1*b+x0} and @m{y=y_1b+y_0,y=y1*b+y0}, and the
  7811  following holds,
  7812  
  7813  @display
  7814  @m{xy = (b^2+b)x_1y_1 - b(x_1-x_0)(y_1-y_0) + (b+1)x_0y_0,
  7815    x*y = (b^2+b)*x1*y1 - b*(x1-x0)*(y1-y0) + (b+1)*x0*y0}
  7816  @end display
  7817  
  7818  This formula means doing only three multiplies of (N/2)@cross{}(N/2) limbs,
  7819  whereas a basecase multiply of N@cross{}N limbs is equivalent to four
  7820  multiplies of (N/2)@cross{}(N/2).  The factors @math{(b^2+b)} etc represent
  7821  the positions where the three products must be added.
  7822  
  7823  @tex
  7824  \def\GMPboxA#1#2{%
  7825    \vbox{%
  7826      \hrule
  7827      \hbox{%
  7828        \GMPvrule
  7829        \hbox to 2\GMPboxwidth {\hfil\hbox{$#1$}\hfil}%
  7830        \vrule
  7831        \hbox to 2\GMPboxwidth {\hfil\hbox{$#2$}\hfil}%
  7832        \vrule}
  7833      \hrule}}
  7834  \def\GMPboxB#1#2{%
  7835    \hbox{%
  7836      \raise \GMPboxdepth \hbox to \GMPboxwidth {\hfil #1\hskip 0.5em}%
  7837      \vbox{%
  7838        \hrule
  7839        \hbox{%
  7840          \GMPvrule
  7841          \hbox to 2\GMPboxwidth {\hfil\hbox{$#2$}\hfil}%
  7842          \vrule}%
  7843        \hrule}}}
  7844  \GMPdisplay{%
  7845  \vbox{%
  7846    \hbox to 4\GMPboxwidth {high \hfil low}
  7847    \vskip 0.7ex
  7848    \GMPboxA{x_1y_1}{x_0y_0}
  7849    \vskip 0.5ex
  7850    \GMPboxB{$+$}{x_1y_1}
  7851    \vskip 0.5ex
  7852    \GMPboxB{$+$}{x_0y_0}
  7853    \vskip 0.5ex
  7854    \GMPboxB{$-$}{(x_1-x_0)(y_1-y_0)}
  7855  }}
  7856  @end tex
  7857  @ifnottex
  7858  @example
  7859  @group
  7860   high                              low
  7861  +--------+--------+ +--------+--------+
  7862  |      x1*y1      | |      x0*y0      |
  7863  +--------+--------+ +--------+--------+
  7864            +--------+--------+
  7865        add |      x1*y1      |
  7866            +--------+--------+
  7867            +--------+--------+
  7868        add |      x0*y0      |
  7869            +--------+--------+
  7870            +--------+--------+
  7871        sub | (x1-x0)*(y1-y0) |
  7872            +--------+--------+
  7873  @end group
  7874  @end example
  7875  @end ifnottex
  7876  
  7877  The term @m{(x_1-x_0)(y_1-y_0),(x1-x0)*(y1-y0)} is best calculated as an
  7878  absolute value, and the sign used to choose to add or subtract.  Notice the
  7879  sum @m{\mathop{\rm high}(x_0y_0)+\mathop{\rm low}(x_1y_1),
  7880  high(x0*y0)+low(x1*y1)} occurs twice, so it's possible to do @m{5k,5*k} limb
  7881  additions, rather than @m{6k,6*k}, but in GMP extra function call overheads
  7882  outweigh the saving.
  7883  
  7884  Squaring is similar to multiplying, but with @math{x=y} the formula reduces to
  7885  an equivalent with three squares,
  7886  
  7887  @display
  7888  @m{x^2 = (b^2+b)x_1^2 - b(x_1-x_0)^2 + (b+1)x_0^2,
  7889     x^2 = (b^2+b)*x1^2 - b*(x1-x0)^2 + (b+1)*x0^2}
  7890  @end display
  7891  
  7892  The final result is accumulated from those three squares the same way as for
  7893  the three multiplies above.  The middle term @m{(x_1-x_0)^2,(x1-x0)^2} is now
  7894  always positive.
  7895  
  7896  A similar formula for both multiplying and squaring can be constructed with a
  7897  middle term @m{(x_1+x_0)(y_1+y_0),(x1+x0)*(y1+y0)}.  But those sums can exceed
  7898  @math{k} limbs, leading to more carry handling and additions than the form
  7899  above.
  7900  
  7901  Karatsuba multiplication is asymptotically an @math{O(N^@W{1.585})} algorithm,
  7902  the exponent being @m{\log3/\log2,log(3)/log(2)}, representing 3 multiplies
  7903  each @math{1/2} the size of the inputs.  This is a big improvement over the
  7904  basecase multiply at @math{O(N^2)} and the advantage soon overcomes the extra
  7905  additions Karatsuba performs.  @code{MUL_TOOM22_THRESHOLD} can be as little
  7906  as 10 limbs.  The @code{SQR} threshold is usually about twice the @code{MUL}.
  7907  
  7908  The basecase algorithm will take a time of the form @m{M(N) = aN^2 + bN + c,
  7909  M(N) = a*N^2 + b*N + c} and the Karatsuba algorithm @m{K(N) = 3M(N/2) + dN +
  7910  e, K(N) = 3*M(N/2) + d*N + e}, which expands to @m{K(N) = {3\over4} aN^2 +
  7911  {3\over2} bN + 3c + dN + e, K(N) = 3/4*a*N^2 + 3/2*b*N + 3*c + d*N + e}.  The
  7912  factor @m{3\over4, 3/4} for @math{a} means per-crossproduct speedups in the
  7913  basecase code will increase the threshold since they benefit @math{M(N)} more
  7914  than @math{K(N)}.  And conversely the @m{3\over2, 3/2} for @math{b} means
  7915  linear style speedups of @math{b} will increase the threshold since they
  7916  benefit @math{K(N)} more than @math{M(N)}.  The latter can be seen for
  7917  instance when adding an optimized @code{mpn_sqr_diagonal} to
  7918  @code{mpn_sqr_basecase}.  Of course all speedups reduce total time, and in
  7919  that sense the algorithm thresholds are merely of academic interest.
  7920  
  7921  
  7922  @node Toom 3-Way Multiplication, Toom 4-Way Multiplication, Karatsuba Multiplication, Multiplication Algorithms
  7923  @subsection Toom 3-Way Multiplication
  7924  @cindex Toom multiplication
  7925  
  7926  The Karatsuba formula is the simplest case of a general approach to splitting
  7927  inputs that leads to both Toom and FFT algorithms.  A description of
  7928  Toom can be found in Knuth section 4.3.3, with an example 3-way
  7929  calculation after Theorem A@.  The 3-way form used in GMP is described here.
  7930  
  7931  The operands are each considered split into 3 pieces of equal length (or the
  7932  most significant part 1 or 2 limbs shorter than the other two).
  7933  
  7934  @tex
  7935  \def\GMPbox#1#2#3{%
  7936    \vbox{%
  7937      \hrule \vfil
  7938      \hbox to 3\GMPboxwidth {%
  7939        \GMPvrule
  7940        \hfil$#1$\hfil
  7941        \vrule
  7942        \hfil$#2$\hfil
  7943        \vrule
  7944        \hfil$#3$\hfil
  7945        \vrule}%
  7946      \vfil \hrule
  7947  }}
  7948  \GMPdisplay{%
  7949  \vbox{%
  7950    \hbox to 3\GMPboxwidth {high \hfil low}
  7951    \vskip 0.7ex
  7952    \GMPbox{x_2}{x_1}{x_0}
  7953    \vskip 0.5ex
  7954    \GMPbox{y_2}{y_1}{y_0}
  7955    \vskip 0.5ex
  7956  }}
  7957  @end tex
  7958  @ifnottex
  7959  @example
  7960  @group
  7961   high                         low
  7962  +----------+----------+----------+
  7963  |    x2    |    x1    |    x0    |
  7964  +----------+----------+----------+
  7965  
  7966  +----------+----------+----------+
  7967  |    y2    |    y1    |    y0    |
  7968  +----------+----------+----------+
  7969  @end group
  7970  @end example
  7971  @end ifnottex
  7972  
  7973  @noindent
  7974  These parts are treated as the coefficients of two polynomials
  7975  
  7976  @display
  7977  @group
  7978  @m{X(t) = x_2t^2 + x_1t + x_0,
  7979     X(t) = x2*t^2 + x1*t + x0}
  7980  @m{Y(t) = y_2t^2 + y_1t + y_0,
  7981     Y(t) = y2*t^2 + y1*t + y0}
  7982  @end group
  7983  @end display
  7984  
  7985  Let @math{b} equal the power of 2 which is the size of the @ms{x,0}, @ms{x,1},
  7986  @ms{y,0} and @ms{y,1} pieces, i.e.@: if they're @math{k} limbs each then
  7987  @m{b=2\GMPraise{$k*$@code{mp\_bits\_per\_limb}}, b=2^(k*mp_bits_per_limb)}.
  7988  With this @math{x=X(b)} and @math{y=Y(b)}.
  7989  
  7990  Let a polynomial @m{W(t)=X(t)Y(t),W(t)=X(t)*Y(t)} and suppose its coefficients
  7991  are
  7992  
  7993  @display
  7994  @m{W(t) = w_4t^4 + w_3t^3 + w_2t^2 + w_1t + w_0,
  7995     W(t) = w4*t^4 + w3*t^3 + w2*t^2 + w1*t + w0}
  7996  @end display
  7997  
  7998  The @m{w_i,w[i]} are going to be determined, and when they are they'll give
  7999  the final result using @math{w=W(b)}, since
  8000  @m{xy=X(b)Y(b),x*y=X(b)*Y(b)=W(b)}.  The coefficients will be roughly
  8001  @math{b^2} each, and the final @math{W(b)} will be an addition like,
  8002  
  8003  @tex
  8004  \def\GMPbox#1#2{%
  8005    \moveright #1\GMPboxwidth
  8006    \vbox{%
  8007      \hrule
  8008      \hbox{%
  8009        \GMPvrule
  8010        \hbox to 2\GMPboxwidth {\hfil$#2$\hfil}%
  8011        \vrule}%
  8012      \hrule
  8013  }}
  8014  \GMPdisplay{%
  8015  \vbox{%
  8016    \hbox to 6\GMPboxwidth {high \hfil low}%
  8017    \vskip 0.7ex
  8018    \GMPbox{0}{w_4}
  8019    \vskip 0.5ex
  8020    \GMPbox{1}{w_3}
  8021    \vskip 0.5ex
  8022    \GMPbox{2}{w_2}
  8023    \vskip 0.5ex
  8024    \GMPbox{3}{w_1}
  8025    \vskip 0.5ex
  8026    \GMPbox{4}{w_0}
  8027  }}
  8028  @end tex
  8029  @ifnottex
  8030  @example
  8031  @group
  8032   high                                        low
  8033  +-------+-------+
  8034  |       w4      |
  8035  +-------+-------+
  8036         +--------+-------+
  8037         |        w3      |
  8038         +--------+-------+
  8039                 +--------+-------+
  8040                 |        w2      |
  8041                 +--------+-------+
  8042                         +--------+-------+
  8043                         |        w1      |
  8044                         +--------+-------+
  8045                                  +-------+-------+
  8046                                  |       w0      |
  8047                                  +-------+-------+
  8048  @end group
  8049  @end example
  8050  @end ifnottex
  8051  
  8052  The @m{w_i,w[i]} coefficients could be formed by a simple set of cross
  8053  products, like @m{w_4=x_2y_2,w4=x2*y2}, @m{w_3=x_2y_1+x_1y_2,w3=x2*y1+x1*y2},
  8054  @m{w_2=x_2y_0+x_1y_1+x_0y_2,w2=x2*y0+x1*y1+x0*y2} etc, but this would need all
  8055  nine @m{x_iy_j,x[i]*y[j]} for @math{i,j=0,1,2}, and would be equivalent merely
  8056  to a basecase multiply.  Instead the following approach is used.
  8057  
  8058  @math{X(t)} and @math{Y(t)} are evaluated and multiplied at 5 points, giving
  8059  values of @math{W(t)} at those points.  In GMP the following points are used,
  8060  
  8061  @quotation
  8062  @multitable {@m{t=\infty,t=inf}M} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
  8063  @item Point                 @tab Value
  8064  @item @math{t=0}            @tab @m{x_0y_0,x0 * y0}, which gives @ms{w,0} immediately
  8065  @item @math{t=1}            @tab @m{(x_2+x_1+x_0)(y_2+y_1+y_0),(x2+x1+x0) * (y2+y1+y0)}
  8066  @item @math{t=-1}           @tab @m{(x_2-x_1+x_0)(y_2-y_1+y_0),(x2-x1+x0) * (y2-y1+y0)}
  8067  @item @math{t=2}            @tab @m{(4x_2+2x_1+x_0)(4y_2+2y_1+y_0),(4*x2+2*x1+x0) * (4*y2+2*y1+y0)}
  8068  @item @m{t=\infty,t=inf}    @tab @m{x_2y_2,x2 * y2}, which gives @ms{w,4} immediately
  8069  @end multitable
  8070  @end quotation
  8071  
  8072  At @math{t=-1} the values can be negative and that's handled using the
  8073  absolute values and tracking the sign separately.  At @m{t=\infty,t=inf} the
  8074  value is actually @m{\lim_{t\to\infty} {X(t)Y(t)\over t^4}, X(t)*Y(t)/t^4 in
  8075  the limit as t approaches infinity}, but it's much easier to think of as
  8076  simply @m{x_2y_2,x2*y2} giving @ms{w,4} immediately (much like
  8077  @m{x_0y_0,x0*y0} at @math{t=0} gives @ms{w,0} immediately).
  8078  
  8079  Each of the points substituted into
  8080  @m{W(t)=w_4t^4+\cdots+w_0,W(t)=w4*t^4+@dots{}+w0} gives a linear combination
  8081  of the @m{w_i,w[i]} coefficients, and the value of those combinations has just
  8082  been calculated.
  8083  
  8084  @tex
  8085  \GMPdisplay{%
  8086  $\matrix{%
  8087  W(0)      & = &       &   &      &   &      &   &      &   & w_0 \cr
  8088  W(1)      & = &   w_4 & + &  w_3 & + &  w_2 & + &  w_1 & + & w_0 \cr
  8089  W(-1)     & = &   w_4 & - &  w_3 & + &  w_2 & - &  w_1 & + & w_0 \cr
  8090  W(2)      & = & 16w_4 & + & 8w_3 & + & 4w_2 & + & 2w_1 & + & w_0 \cr
  8091  W(\infty) & = &   w_4 \cr
  8092  }$}
  8093  @end tex
  8094  @ifnottex
  8095  @example
  8096  @group
  8097  W(0)   =                              w0
  8098  W(1)   =    w4 +   w3 +   w2 +   w1 + w0
  8099  W(-1)  =    w4 -   w3 +   w2 -   w1 + w0
  8100  W(2)   = 16*w4 + 8*w3 + 4*w2 + 2*w1 + w0
  8101  W(inf) =    w4
  8102  @end group
  8103  @end example
  8104  @end ifnottex
  8105  
  8106  This is a set of five equations in five unknowns, and some elementary linear
  8107  algebra quickly isolates each @m{w_i,w[i]}.  This involves adding or
  8108  subtracting one @math{W(t)} value from another, and a couple of divisions by
  8109  powers of 2 and one division by 3, the latter using the special
  8110  @code{mpn_divexact_by3} (@pxref{Exact Division}).
  8111  
  8112  The conversion of @math{W(t)} values to the coefficients is interpolation.  A
  8113  polynomial of degree 4 like @math{W(t)} is uniquely determined by values known
  8114  at 5 different points.  The points are arbitrary and can be chosen to make the
  8115  linear equations come out with a convenient set of steps for quickly isolating
  8116  the @m{w_i,w[i]}.
  8117  
  8118  Squaring follows the same procedure as multiplication, but there's only one
  8119  @math{X(t)} and it's evaluated at the 5 points, and those values squared to
  8120  give values of @math{W(t)}.  The interpolation is then identical, and in fact
  8121  the same @code{toom_interpolate_5pts} subroutine is used for both squaring and
  8122  multiplying.
  8123  
  8124  Toom-3 is asymptotically @math{O(N^@W{1.465})}, the exponent being
  8125  @m{\log5/\log3,log(5)/log(3)}, representing 5 recursive multiplies of 1/3 the
  8126  original size each.  This is an improvement over Karatsuba at
  8127  @math{O(N^@W{1.585})}, though Toom does more work in the evaluation and
  8128  interpolation and so it only realizes its advantage above a certain size.
  8129  
  8130  Near the crossover between Toom-3 and Karatsuba there's generally a range of
  8131  sizes where the difference between the two is small.
  8132  @code{MUL_TOOM33_THRESHOLD} is a somewhat arbitrary point in that range and
  8133  successive runs of the tune program can give different values due to small
  8134  variations in measuring.  A graph of time versus size for the two shows the
  8135  effect, see @file{tune/README}.
  8136  
  8137  At the fairly small sizes where the Toom-3 thresholds occur it's worth
  8138  remembering that the asymptotic behaviour for Karatsuba and Toom-3 can't be
  8139  expected to make accurate predictions, due of course to the big influence of
  8140  all sorts of overheads, and the fact that only a few recursions of each are
  8141  being performed.  Even at large sizes there's a good chance machine dependent
  8142  effects like cache architecture will mean actual performance deviates from
  8143  what might be predicted.
  8144  
  8145  The formula given for the Karatsuba algorithm (@pxref{Karatsuba
  8146  Multiplication}) has an equivalent for Toom-3 involving only five multiplies,
  8147  but this would be complicated and unenlightening.
  8148  
  8149  An alternate view of Toom-3 can be found in Zuras (@pxref{References}), using
  8150  a vector to represent the @math{x} and @math{y} splits and a matrix
  8151  multiplication for the evaluation and interpolation stages.  The matrix
  8152  inverses are not meant to be actually used, and they have elements with values
  8153  much greater than in fact arise in the interpolation steps.  The diagram shown
  8154  for the 3-way is attractive, but again doesn't have to be implemented that way
  8155  and for example with a bit of rearrangement just one division by 6 can be
  8156  done.
  8157  
  8158  
  8159  @node Toom 4-Way Multiplication, Higher degree Toom'n'half, Toom 3-Way Multiplication, Multiplication Algorithms
  8160  @subsection Toom 4-Way Multiplication
  8161  @cindex Toom multiplication
  8162  
  8163  Karatsuba and Toom-3 split the operands into 2 and 3 coefficients,
  8164  respectively.  Toom-4 analogously splits the operands into 4 coefficients.
  8165  Using the notation from the section on Toom-3 multiplication, we form two
  8166  polynomials:
  8167  
  8168  @display
  8169  @group
  8170  @m{X(t) = x_3t^3 + x_2t^2 + x_1t + x_0,
  8171     X(t) = x3*t^3 + x2*t^2 + x1*t + x0}
  8172  @m{Y(t) = y_3t^3 + y_2t^2 + y_1t + y_0,
  8173     Y(t) = y3*t^3 + y2*t^2 + y1*t + y0}
  8174  @end group
  8175  @end display
  8176  
  8177  @math{X(t)} and @math{Y(t)} are evaluated and multiplied at 7 points, giving
  8178  values of @math{W(t)} at those points.  In GMP the following points are used,
  8179  
  8180  @quotation
  8181  @multitable {@m{t=-1/2,t=inf}M} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
  8182  @item Point              @tab Value
  8183  @item @math{t=0}         @tab @m{x_0y_0,x0 * y0}, which gives @ms{w,0} immediately
  8184  @item @math{t=1/2}       @tab @m{(x_3+2x_2+4x_1+8x_0)(y_3+2y_2+4y_1+8y_0),(x3+2*x2+4*x1+8*x0) * (y3+2*y2+4*y1+8*y0)}
  8185  @item @math{t=-1/2}      @tab @m{(-x_3+2x_2-4x_1+8x_0)(-y_3+2y_2-4y_1+8y_0),(-x3+2*x2-4*x1+8*x0) * (-y3+2*y2-4*y1+8*y0)}
  8186  @item @math{t=1}         @tab @m{(x_3+x_2+x_1+x_0)(y_3+y_2+y_1+y_0),(x3+x2+x1+x0) * (y3+y2+y1+y0)}
  8187  @item @math{t=-1}        @tab @m{(-x_3+x_2-x_1+x_0)(-y_3+y_2-y_1+y_0),(-x3+x2-x1+x0) * (-y3+y2-y1+y0)}
  8188  @item @math{t=2}         @tab @m{(8x_3+4x_2+2x_1+x_0)(8y_3+4y_2+2y_1+y_0),(8*x3+4*x2+2*x1+x0) * (8*y3+4*y2+2*y1+y0)}
  8189  @item @m{t=\infty,t=inf} @tab @m{x_3y_3,x3 * y3}, which gives @ms{w,6} immediately
  8190  @end multitable
  8191  @end quotation
  8192  
  8193  The number of additions and subtractions for Toom-4 is much larger than for Toom-3.
  8194  But several subexpressions occur multiple times, for example @m{x_2+x_0,x2+x0}, occurs
  8195  for both @math{t=1} and @math{t=-1}.
  8196  
  8197  Toom-4 is asymptotically @math{O(N^@W{1.404})}, the exponent being
  8198  @m{\log7/\log4,log(7)/log(4)}, representing 7 recursive multiplies of 1/4 the
  8199  original size each.
  8200  
  8201  
  8202  @node Higher degree Toom'n'half, FFT Multiplication, Toom 4-Way Multiplication, Multiplication Algorithms
  8203  @subsection Higher degree Toom'n'half
  8204  @cindex Toom multiplication
  8205  
  8206  The Toom algorithms described above (@pxref{Toom 3-Way Multiplication},
  8207  @pxref{Toom 4-Way Multiplication}) generalizes to split into an arbitrary
  8208  number of pieces. In general a split of two equally long operands into
  8209  @math{r} pieces leads to evaluations and pointwise multiplications done at
  8210  @m{2r-1,2*r-1} points. To fully exploit symmetries it would be better to have
  8211  a multiple of 4 points, that's why for higher degree Toom'n'half is used.
  8212  
  8213  Toom'n'half means that the existence of one more piece is considered for a
  8214  single operand. It can be virtual, i.e. zero, or real, when the two operand
  8215  are not exactly balanced. By choosing an even @math{r},
  8216  Toom-@m{r{1\over2},r+1/2} requires @math{2r} points, a multiple of four.
  8217  
  8218  The four-plets of points include 0, @m{\infty,inf}, +1, -1 and
  8219  @m{\pm2^i,+-2^i}, @m{\pm2^{-i},+-2^-i} . Each of them giving shortcuts for the
  8220  evaluation phase and for some steps in the interpolation phase. Further tricks
  8221  are used to reduce the memory footprint of the whole multiplication algorithm
  8222  to a memory buffer equanl in size to the result of the product.
  8223  
  8224  Current GMP uses both Toom-6'n'half and Toom-8'n'half.
  8225  
  8226  
  8227  @node FFT Multiplication, Other Multiplication, Higher degree Toom'n'half, Multiplication Algorithms
  8228  @subsection FFT Multiplication
  8229  @cindex FFT multiplication
  8230  @cindex Fast Fourier Transform
  8231  
  8232  At large to very large sizes a Fermat style FFT multiplication is used,
  8233  following Sch@"onhage and Strassen (@pxref{References}).  Descriptions of FFTs
  8234  in various forms can be found in many textbooks, for instance Knuth section
  8235  4.3.3 part C or Lipson chapter IX@.  A brief description of the form used in
  8236  GMP is given here.
  8237  
  8238  The multiplication done is @m{xy \bmod 2^N+1, x*y mod 2^N+1}, for a given
  8239  @math{N}.  A full product @m{xy,x*y} is obtained by choosing @m{N \ge
  8240  \mathop{\rm bits}(x)+\mathop{\rm bits}(y), N>=bits(x)+bits(y)} and padding
  8241  @math{x} and @math{y} with high zero limbs.  The modular product is the native
  8242  form for the algorithm, so padding to get a full product is unavoidable.
  8243  
  8244  The algorithm follows a split, evaluate, pointwise multiply, interpolate and
  8245  combine similar to that described above for Karatsuba and Toom-3.  A @math{k}
  8246  parameter controls the split, with an FFT-@math{k} splitting into @math{2^k}
  8247  pieces of @math{M=N/2^k} bits each.  @math{N} must be a multiple of
  8248  @m{2^k\times@code{mp\_bits\_per\_limb}, (2^k)*@nicode{mp_bits_per_limb}} so
  8249  the split falls on limb boundaries, avoiding bit shifts in the split and
  8250  combine stages.
  8251  
  8252  The evaluations, pointwise multiplications, and interpolation, are all done
  8253  modulo @m{2^{N'}+1, 2^N'+1} where @math{N'} is @math{2M+k+3} rounded up to a
  8254  multiple of @math{2^k} and of @code{mp_bits_per_limb}.  The results of
  8255  interpolation will be the following negacyclic convolution of the input
  8256  pieces, and the choice of @math{N'} ensures these sums aren't truncated.
  8257  @tex
  8258  $$ w_n = \sum_{{i+j = b2^k+n}\atop{b=0,1}} (-1)^b x_i y_j $$
  8259  @end tex
  8260  @ifnottex
  8261  
  8262  @example
  8263             ---
  8264             \         b
  8265  w[n] =     /     (-1) * x[i] * y[j]
  8266             ---
  8267         i+j==b*2^k+n
  8268            b=0,1
  8269  @end example
  8270  
  8271  @end ifnottex
  8272  The points used for the evaluation are @math{g^i} for @math{i=0} to
  8273  @math{2^k-1} where @m{g=2^{2N'/2^k}, g=2^(2N'/2^k)}.  @math{g} is a
  8274  @m{2^k,2^k'}th root of unity mod @m{2^{N'}+1,2^N'+1}, which produces necessary
  8275  cancellations at the interpolation stage, and it's also a power of 2 so the
  8276  fast Fourier transforms used for the evaluation and interpolation do only
  8277  shifts, adds and negations.
  8278  
  8279  The pointwise multiplications are done modulo @m{2^{N'}+1, 2^N'+1} and either
  8280  recurse into a further FFT or use a plain multiplication (Toom-3, Karatsuba or
  8281  basecase), whichever is optimal at the size @math{N'}.  The interpolation is
  8282  an inverse fast Fourier transform.  The resulting set of sums of @m{x_iy_j,
  8283  x[i]*y[j]} are added at appropriate offsets to give the final result.
  8284  
  8285  Squaring is the same, but @math{x} is the only input so it's one transform at
  8286  the evaluate stage and the pointwise multiplies are squares.  The
  8287  interpolation is the same.
  8288  
  8289  For a mod @math{2^N+1} product, an FFT-@math{k} is an @m{O(N^{k/(k-1)}),
  8290  O(N^(k/(k-1)))} algorithm, the exponent representing @math{2^k} recursed
  8291  modular multiplies each @m{1/2^{k-1},1/2^(k-1)} the size of the original.
  8292  Each successive @math{k} is an asymptotic improvement, but overheads mean each
  8293  is only faster at bigger and bigger sizes.  In the code, @code{MUL_FFT_TABLE}
  8294  and @code{SQR_FFT_TABLE} are the thresholds where each @math{k} is used.  Each
  8295  new @math{k} effectively swaps some multiplying for some shifts, adds and
  8296  overheads.
  8297  
  8298  A mod @math{2^N+1} product can be formed with a normal
  8299  @math{N@cross{}N@rightarrow{}2N} bit multiply plus a subtraction, so an FFT
  8300  and Toom-3 etc can be compared directly.  A @math{k=4} FFT at
  8301  @math{O(N^@W{1.333})} can be expected to be the first faster than Toom-3 at
  8302  @math{O(N^@W{1.465})}.  In practice this is what's found, with
  8303  @code{MUL_FFT_MODF_THRESHOLD} and @code{SQR_FFT_MODF_THRESHOLD} being between
  8304  300 and 1000 limbs, depending on the CPU@.  So far it's been found that only
  8305  very large FFTs recurse into pointwise multiplies above these sizes.
  8306  
  8307  When an FFT is to give a full product, the change of @math{N} to @math{2N}
  8308  doesn't alter the theoretical complexity for a given @math{k}, but for the
  8309  purposes of considering where an FFT might be first used it can be assumed
  8310  that the FFT is recursing into a normal multiply and that on that basis it's
  8311  doing @math{2^k} recursed multiplies each @m{1/2^{k-2},1/2^(k-2)} the size of
  8312  the inputs, making it @m{O(N^{k/(k-2)}), O(N^(k/(k-2)))}.  This would mean
  8313  @math{k=7} at @math{O(N^@W{1.4})} would be the first FFT faster than Toom-3.
  8314  In practice @code{MUL_FFT_THRESHOLD} and @code{SQR_FFT_THRESHOLD} have been
  8315  found to be in the @math{k=8} range, somewhere between 3000 and 10000 limbs.
  8316  
  8317  The way @math{N} is split into @math{2^k} pieces and then @math{2M+k+3} is
  8318  rounded up to a multiple of @math{2^k} and @code{mp_bits_per_limb} means that
  8319  when @math{2^k@ge{}@nicode{mp\_bits\_per\_limb}} the effective @math{N} is a
  8320  multiple of @m{2^{2k-1},2^(2k-1)} bits.  The @math{+k+3} means some values of
  8321  @math{N} just under such a multiple will be rounded to the next.  The
  8322  complexity calculations above assume that a favourable size is used, meaning
  8323  one which isn't padded through rounding, and it's also assumed that the extra
  8324  @math{+k+3} bits are negligible at typical FFT sizes.
  8325  
  8326  The practical effect of the @m{2^{2k-1},2^(2k-1)} constraint is to introduce a
  8327  step-effect into measured speeds.  For example @math{k=8} will round @math{N}
  8328  up to a multiple of 32768 bits, so for a 32-bit limb there'll be 512 limb
  8329  groups of sizes for which @code{mpn_mul_n} runs at the same speed.  Or for
  8330  @math{k=9} groups of 2048 limbs, @math{k=10} groups of 8192 limbs, etc.  In
  8331  practice it's been found each @math{k} is used at quite small multiples of its
  8332  size constraint and so the step effect is quite noticeable in a time versus
  8333  size graph.
  8334  
  8335  The threshold determinations currently measure at the mid-points of size
  8336  steps, but this is sub-optimal since at the start of a new step it can happen
  8337  that it's better to go back to the previous @math{k} for a while.  Something
  8338  more sophisticated for @code{MUL_FFT_TABLE} and @code{SQR_FFT_TABLE} will be
  8339  needed.
  8340  
  8341  
  8342  @node Other Multiplication, Unbalanced Multiplication, FFT Multiplication, Multiplication Algorithms
  8343  @subsection Other Multiplication
  8344  @cindex Toom multiplication
  8345  
  8346  The Toom algorithms described above (@pxref{Toom 3-Way Multiplication},
  8347  @pxref{Toom 4-Way Multiplication}) generalizes to split into an arbitrary
  8348  number of pieces, as per Knuth section 4.3.3 algorithm C@.  This is not
  8349  currently used.  The notes here are merely for interest.
  8350  
  8351  In general a split into @math{r+1} pieces is made, and evaluations and
  8352  pointwise multiplications done at @m{2r+1,2*r+1} points.  A 4-way split does 7
  8353  pointwise multiplies, 5-way does 9, etc.  Asymptotically an @math{(r+1)}-way
  8354  algorithm is @m{O(N^{log(2r+1)/log(r+1)}), O(N^(log(2*r+1)/log(r+1)))}.  Only
  8355  the pointwise multiplications count towards big-@math{O} complexity, but the
  8356  time spent in the evaluate and interpolate stages grows with @math{r} and has
  8357  a significant practical impact, with the asymptotic advantage of each @math{r}
  8358  realized only at bigger and bigger sizes.  The overheads grow as
  8359  @m{O(Nr),O(N*r)}, whereas in an @math{r=2^k} FFT they grow only as @m{O(N \log
  8360  r), O(N*log(r))}.
  8361  
  8362  Knuth algorithm C evaluates at points 0,1,2,@dots{},@m{2r,2*r}, but exercise 4
  8363  uses @math{-r},@dots{},0,@dots{},@math{r} and the latter saves some small
  8364  multiplies in the evaluate stage (or rather trades them for additions), and
  8365  has a further saving of nearly half the interpolate steps.  The idea is to
  8366  separate odd and even final coefficients and then perform algorithm C steps C7
  8367  and C8 on them separately.  The divisors at step C7 become @math{j^2} and the
  8368  multipliers at C8 become @m{2tj-j^2,2*t*j-j^2}.
  8369  
  8370  Splitting odd and even parts through positive and negative points can be
  8371  thought of as using @math{-1} as a square root of unity.  If a 4th root of
  8372  unity was available then a further split and speedup would be possible, but no
  8373  such root exists for plain integers.  Going to complex integers with
  8374  @m{i=\sqrt{-1}, i=sqrt(-1)} doesn't help, essentially because in Cartesian
  8375  form it takes three real multiplies to do a complex multiply.  The existence
  8376  of @m{2^k,2^k'}th roots of unity in a suitable ring or field lets the fast
  8377  Fourier transform keep splitting and get to @m{O(N \log r), O(N*log(r))}.
  8378  
  8379  Floating point FFTs use complex numbers approximating Nth roots of unity.
  8380  Some processors have special support for such FFTs.  But these are not used in
  8381  GMP since it's very difficult to guarantee an exact result (to some number of
  8382  bits).  An occasional difference of 1 in the last bit might not matter to a
  8383  typical signal processing algorithm, but is of course of vital importance to
  8384  GMP.
  8385  
  8386  
  8387  @node Unbalanced Multiplication,  , Other Multiplication, Multiplication Algorithms
  8388  @subsection Unbalanced Multiplication
  8389  @cindex Unbalanced multiplication
  8390  
  8391  Multiplication of operands with different sizes, both below
  8392  @code{MUL_TOOM22_THRESHOLD} are done with plain schoolbook multiplication
  8393  (@pxref{Basecase Multiplication}).
  8394  
  8395  For really large operands, we invoke FFT directly.
  8396  
  8397  For operands between these sizes, we use Toom inspired algorithms suggested by
  8398  Alberto Zanoni and Marco Bodrato.  The idea is to split the operands into
  8399  polynomials of different degree.  GMP currently splits the smaller operand
  8400  onto 2 coefficients, i.e., a polynomial of degree 1, but the larger operand
  8401  can be split into 2, 3, or 4 coefficients, i.e., a polynomial of degree 1 to
  8402  3.
  8403  
  8404  @c FIXME: This is mighty ugly, but a cleaner @need triggers texinfo bugs that
  8405  @c screws up layout here and there in the rest of the manual.
  8406  @c @tex
  8407  @c \goodbreak
  8408  @c @end tex
  8409  @node Division Algorithms, Greatest Common Divisor Algorithms, Multiplication Algorithms, Algorithms
  8410  @section Division Algorithms
  8411  @cindex Division algorithms
  8412  
  8413  @menu
  8414  * Single Limb Division::
  8415  * Basecase Division::
  8416  * Divide and Conquer Division::
  8417  * Block-Wise Barrett Division::
  8418  * Exact Division::
  8419  * Exact Remainder::
  8420  * Small Quotient Division::
  8421  @end menu
  8422  
  8423  
  8424  @node Single Limb Division, Basecase Division, Division Algorithms, Division Algorithms
  8425  @subsection Single Limb Division
  8426  
  8427  N@cross{}1 division is implemented using repeated 2@cross{}1 divisions from
  8428  high to low, either with a hardware divide instruction or a multiplication by
  8429  inverse, whichever is best on a given CPU.
  8430  
  8431  The multiply by inverse follows ``Improved division by invariant integers'' by
  8432  M@"oller and Granlund (@pxref{References}) and is implemented as
  8433  @code{udiv_qrnnd_preinv} in @file{gmp-impl.h}.  The idea is to have a
  8434  fixed-point approximation to @math{1/d} (see @code{invert_limb}) and then
  8435  multiply by the high limb (plus one bit) of the dividend to get a quotient
  8436  @math{q}.  With @math{d} normalized (high bit set), @math{q} is no more than 1
  8437  too small.  Subtracting @m{qd,q*d} from the dividend gives a remainder, and
  8438  reveals whether @math{q} or @math{q-1} is correct.
  8439  
  8440  The result is a division done with two multiplications and four or five
  8441  arithmetic operations.  On CPUs with low latency multipliers this can be much
  8442  faster than a hardware divide, though the cost of calculating the inverse at
  8443  the start may mean it's only better on inputs bigger than say 4 or 5 limbs.
  8444  
  8445  When a divisor must be normalized, either for the generic C
  8446  @code{__udiv_qrnnd_c} or the multiply by inverse, the division performed is
  8447  actually @m{a2^k,a*2^k} by @m{d2^k,d*2^k} where @math{a} is the dividend and
  8448  @math{k} is the power necessary to have the high bit of @m{d2^k,d*2^k} set.
  8449  The bit shifts for the dividend are usually accomplished ``on the fly''
  8450  meaning by extracting the appropriate bits at each step.  Done this way the
  8451  quotient limbs come out aligned ready to store.  When only the remainder is
  8452  wanted, an alternative is to take the dividend limbs unshifted and calculate
  8453  @m{r = a \bmod d2^k, r = a mod d*2^k} followed by an extra final step @m{r2^k
  8454  \bmod d2^k, r*2^k mod d*2^k}.  This can help on CPUs with poor bit shifts or
  8455  few registers.
  8456  
  8457  The multiply by inverse can be done two limbs at a time.  The calculation is
  8458  basically the same, but the inverse is two limbs and the divisor treated as if
  8459  padded with a low zero limb.  This means more work, since the inverse will
  8460  need a 2@cross{}2 multiply, but the four 1@cross{}1s to do that are
  8461  independent and can therefore be done partly or wholly in parallel.  Likewise
  8462  for a 2@cross{}1 calculating @m{qd,q*d}.  The net effect is to process two
  8463  limbs with roughly the same two multiplies worth of latency that one limb at a
  8464  time gives.  This extends to 3 or 4 limbs at a time, though the extra work to
  8465  apply the inverse will almost certainly soon reach the limits of multiplier
  8466  throughput.
  8467  
  8468  A similar approach in reverse can be taken to process just half a limb at a
  8469  time if the divisor is only a half limb.  In this case the 1@cross{}1 multiply
  8470  for the inverse effectively becomes two @m{{1\over2}\times1, (1/2)x1} for each
  8471  limb, which can be a saving on CPUs with a fast half limb multiply, or in fact
  8472  if the only multiply is a half limb, and especially if it's not pipelined.
  8473  
  8474  
  8475  @node Basecase Division, Divide and Conquer Division, Single Limb Division, Division Algorithms
  8476  @subsection Basecase Division
  8477  
  8478  Basecase N@cross{}M division is like long division done by hand, but in base
  8479  @m{2\GMPraise{@code{mp\_bits\_per\_limb}}, 2^mp_bits_per_limb}.  See Knuth
  8480  section 4.3.1 algorithm D, and @file{mpn/generic/sb_divrem_mn.c}.
  8481  
  8482  Briefly stated, while the dividend remains larger than the divisor, a high
  8483  quotient limb is formed and the N@cross{}1 product @m{qd,q*d} subtracted at
  8484  the top end of the dividend.  With a normalized divisor (most significant bit
  8485  set), each quotient limb can be formed with a 2@cross{}1 division and a
  8486  1@cross{}1 multiplication plus some subtractions.  The 2@cross{}1 division is
  8487  by the high limb of the divisor and is done either with a hardware divide or a
  8488  multiply by inverse (the same as in @ref{Single Limb Division}) whichever is
  8489  faster.  Such a quotient is sometimes one too big, requiring an addback of the
  8490  divisor, but that happens rarely.
  8491  
  8492  With Q=N@minus{}M being the number of quotient limbs, this is an
  8493  @m{O(QM),O(Q*M)} algorithm and will run at a speed similar to a basecase
  8494  Q@cross{}M multiplication, differing in fact only in the extra multiply and
  8495  divide for each of the Q quotient limbs.
  8496  
  8497  
  8498  @node Divide and Conquer Division, Block-Wise Barrett Division, Basecase Division, Division Algorithms
  8499  @subsection Divide and Conquer Division
  8500  
  8501  For divisors larger than @code{DC_DIV_QR_THRESHOLD}, division is done by dividing.
  8502  Or to be precise by a recursive divide and conquer algorithm based on work by
  8503  Moenck and Borodin, Jebelean, and Burnikel and Ziegler (@pxref{References}).
  8504  
  8505  The algorithm consists essentially of recognising that a 2N@cross{}N division
  8506  can be done with the basecase division algorithm (@pxref{Basecase Division}),
  8507  but using N/2 limbs as a base, not just a single limb.  This way the
  8508  multiplications that arise are (N/2)@cross{}(N/2) and can take advantage of
  8509  Karatsuba and higher multiplication algorithms (@pxref{Multiplication
  8510  Algorithms}).  The two ``digits'' of the quotient are formed by recursive
  8511  N@cross{}(N/2) divisions.
  8512  
  8513  If the (N/2)@cross{}(N/2) multiplies are done with a basecase multiplication
  8514  then the work is about the same as a basecase division, but with more function
  8515  call overheads and with some subtractions separated from the multiplies.
  8516  These overheads mean that it's only when N/2 is above
  8517  @code{MUL_TOOM22_THRESHOLD} that divide and conquer is of use.
  8518  
  8519  @code{DC_DIV_QR_THRESHOLD} is based on the divisor size N, so it will be somewhere
  8520  above twice @code{MUL_TOOM22_THRESHOLD}, but how much above depends on the
  8521  CPU@.  An optimized @code{mpn_mul_basecase} can lower @code{DC_DIV_QR_THRESHOLD} a
  8522  little by offering a ready-made advantage over repeated @code{mpn_submul_1}
  8523  calls.
  8524  
  8525  Divide and conquer is asymptotically @m{O(M(N)\log N),O(M(N)*log(N))} where
  8526  @math{M(N)} is the time for an N@cross{}N multiplication done with FFTs.  The
  8527  actual time is a sum over multiplications of the recursed sizes, as can be
  8528  seen near the end of section 2.2 of Burnikel and Ziegler.  For example, within
  8529  the Toom-3 range, divide and conquer is @m{2.63M(N), 2.63*M(N)}.  With higher
  8530  algorithms the @math{M(N)} term improves and the multiplier tends to @m{\log
  8531  N, log(N)}.  In practice, at moderate to large sizes, a 2N@cross{}N division
  8532  is about 2 to 4 times slower than an N@cross{}N multiplication.
  8533  
  8534  
  8535  @node Block-Wise Barrett Division, Exact Division, Divide and Conquer Division, Division Algorithms
  8536  @subsection Block-Wise Barrett Division
  8537  
  8538  For the largest divisions, a block-wise Barrett division algorithm is used.
  8539  Here, the divisor is inverted to a precision determined by the relative size of
  8540  the dividend and divisor.  Blocks of quotient limbs are then generated by
  8541  multiplying blocks from the dividend by the inverse.
  8542  
  8543  Our block-wise algorithm computes a smaller inverse than in the plain Barrett
  8544  algorithm.  For a @math{2n/n} division, the inverse will be just @m{\lceil n/2
  8545  \rceil, ceil(n/2)} limbs.
  8546  
  8547  
  8548  @node Exact Division, Exact Remainder, Block-Wise Barrett Division, Division Algorithms
  8549  @subsection Exact Division
  8550  
  8551  
  8552  A so-called exact division is when the dividend is known to be an exact
  8553  multiple of the divisor.  Jebelean's exact division algorithm uses this
  8554  knowledge to make some significant optimizations (@pxref{References}).
  8555  
  8556  The idea can be illustrated in decimal for example with 368154 divided by
  8557  543.  Because the low digit of the dividend is 4, the low digit of the
  8558  quotient must be 8.  This is arrived at from @m{4 \mathord{\times} 7 \bmod 10,
  8559  4*7 mod 10}, using the fact 7 is the modular inverse of 3 (the low digit of
  8560  the divisor), since @m{3 \mathord{\times} 7 \mathop{\equiv} 1 \bmod 10, 3*7
  8561  @equiv{} 1 mod 10}.  So @m{8\mathord{\times}543 = 4344,8*543=4344} can be
  8562  subtracted from the dividend leaving 363810.  Notice the low digit has become
  8563  zero.
  8564  
  8565  The procedure is repeated at the second digit, with the next quotient digit 7
  8566  (@m{1 \mathord{\times} 7 \bmod 10, 7 @equiv{} 1*7 mod 10}), subtracting
  8567  @m{7\mathord{\times}543 = 3801,7*543=3801}, leaving 325800.  And finally at
  8568  the third digit with quotient digit 6 (@m{8 \mathord{\times} 7 \bmod 10, 8*7
  8569  mod 10}), subtracting @m{6\mathord{\times}543 = 3258,6*543=3258} leaving 0.
  8570  So the quotient is 678.
  8571  
  8572  Notice however that the multiplies and subtractions don't need to extend past
  8573  the low three digits of the dividend, since that's enough to determine the
  8574  three quotient digits.  For the last quotient digit no subtraction is needed
  8575  at all.  On a 2N@cross{}N division like this one, only about half the work of
  8576  a normal basecase division is necessary.
  8577  
  8578  For an N@cross{}M exact division producing Q=N@minus{}M quotient limbs, the
  8579  saving over a normal basecase division is in two parts.  Firstly, each of the
  8580  Q quotient limbs needs only one multiply, not a 2@cross{}1 divide and
  8581  multiply.  Secondly, the crossproducts are reduced when @math{Q>M} to
  8582  @m{QM-M(M+1)/2,Q*M-M*(M+1)/2}, or when @math{Q@le{}M} to @m{Q(Q-1)/2,
  8583  Q*(Q-1)/2}.  Notice the savings are complementary.  If Q is big then many
  8584  divisions are saved, or if Q is small then the crossproducts reduce to a small
  8585  number.
  8586  
  8587  The modular inverse used is calculated efficiently by @code{binvert_limb} in
  8588  @file{gmp-impl.h}.  This does four multiplies for a 32-bit limb, or six for a
  8589  64-bit limb.  @file{tune/modlinv.c} has some alternate implementations that
  8590  might suit processors better at bit twiddling than multiplying.
  8591  
  8592  The sub-quadratic exact division described by Jebelean in ``Exact Division
  8593  with Karatsuba Complexity'' is not currently implemented.  It uses a
  8594  rearrangement similar to the divide and conquer for normal division
  8595  (@pxref{Divide and Conquer Division}), but operating from low to high.  A
  8596  further possibility not currently implemented is ``Bidirectional Exact Integer
  8597  Division'' by Krandick and Jebelean which forms quotient limbs from both the
  8598  high and low ends of the dividend, and can halve once more the number of
  8599  crossproducts needed in a 2N@cross{}N division.
  8600  
  8601  A special case exact division by 3 exists in @code{mpn_divexact_by3},
  8602  supporting Toom-3 multiplication and @code{mpq} canonicalizations.  It forms
  8603  quotient digits with a multiply by the modular inverse of 3 (which is
  8604  @code{0xAA..AAB}) and uses two comparisons to determine a borrow for the next
  8605  limb.  The multiplications don't need to be on the dependent chain, as long as
  8606  the effect of the borrows is applied, which can help chips with pipelined
  8607  multipliers.
  8608  
  8609  
  8610  @node Exact Remainder, Small Quotient Division, Exact Division, Division Algorithms
  8611  @subsection Exact Remainder
  8612  @cindex Exact remainder
  8613  
  8614  If the exact division algorithm is done with a full subtraction at each stage
  8615  and the dividend isn't a multiple of the divisor, then low zero limbs are
  8616  produced but with a remainder in the high limbs.  For dividend @math{a},
  8617  divisor @math{d}, quotient @math{q}, and @m{b = 2
  8618  \GMPraise{@code{mp\_bits\_per\_limb}}, b = 2^mp_bits_per_limb}, this remainder
  8619  @math{r} is of the form
  8620  @tex
  8621  $$ a = qd + r b^n $$
  8622  @end tex
  8623  @ifnottex
  8624  
  8625  @example
  8626  a = q*d + r*b^n
  8627  @end example
  8628  
  8629  @end ifnottex
  8630  @math{n} represents the number of zero limbs produced by the subtractions,
  8631  that being the number of limbs produced for @math{q}.  @math{r} will be in the
  8632  range @math{0@le{}r<d} and can be viewed as a remainder, but one shifted up by
  8633  a factor of @math{b^n}.
  8634  
  8635  Carrying out full subtractions at each stage means the same number of cross
  8636  products must be done as a normal division, but there's still some single limb
  8637  divisions saved.  When @math{d} is a single limb some simplifications arise,
  8638  providing good speedups on a number of processors.
  8639  
  8640  The functions @code{mpn_divexact_by3}, @code{mpn_modexact_1_odd} and the
  8641  internal @code{mpn_redc_X} functions differ subtly in how they return @math{r},
  8642  leading to some negations in the above formula, but all are essentially the
  8643  same.
  8644  
  8645  @cindex Divisibility algorithm
  8646  @cindex Congruence algorithm
  8647  Clearly @math{r} is zero when @math{a} is a multiple of @math{d}, and this
  8648  leads to divisibility or congruence tests which are potentially more efficient
  8649  than a normal division.
  8650  
  8651  The factor of @math{b^n} on @math{r} can be ignored in a GCD when @math{d} is
  8652  odd, hence the use of @code{mpn_modexact_1_odd} by @code{mpn_gcd_1} and
  8653  @code{mpz_kronecker_ui} etc (@pxref{Greatest Common Divisor Algorithms}).
  8654  
  8655  Montgomery's REDC method for modular multiplications uses operands of the form
  8656  of @m{xb^{-n}, x*b^-n} and @m{yb^{-n}, y*b^-n} and on calculating @m{(xb^{-n})
  8657  (yb^{-n}), (x*b^-n)*(y*b^-n)} uses the factor of @math{b^n} in the exact
  8658  remainder to reach a product in the same form @m{(xy)b^{-n}, (x*y)*b^-n}
  8659  (@pxref{Modular Powering Algorithm}).
  8660  
  8661  Notice that @math{r} generally gives no useful information about the ordinary
  8662  remainder @math{a @bmod d} since @math{b^n @bmod d} could be anything.  If
  8663  however @math{b^n @equiv{} 1 @bmod d}, then @math{r} is the negative of the
  8664  ordinary remainder.  This occurs whenever @math{d} is a factor of
  8665  @math{b^n-1}, as for example with 3 in @code{mpn_divexact_by3}.  For a 32 or
  8666  64 bit limb other such factors include 5, 17 and 257, but no particular use
  8667  has been found for this.
  8668  
  8669  
  8670  @node Small Quotient Division,  , Exact Remainder, Division Algorithms
  8671  @subsection Small Quotient Division
  8672  
  8673  An N@cross{}M division where the number of quotient limbs Q=N@minus{}M is
  8674  small can be optimized somewhat.
  8675  
  8676  An ordinary basecase division normalizes the divisor by shifting it to make
  8677  the high bit set, shifting the dividend accordingly, and shifting the
  8678  remainder back down at the end of the calculation.  This is wasteful if only a
  8679  few quotient limbs are to be formed.  Instead a division of just the top
  8680  @m{\rm2Q,2*Q} limbs of the dividend by the top Q limbs of the divisor can be
  8681  used to form a trial quotient.  This requires only those limbs normalized, not
  8682  the whole of the divisor and dividend.
  8683  
  8684  A multiply and subtract then applies the trial quotient to the M@minus{}Q
  8685  unused limbs of the divisor and N@minus{}Q dividend limbs (which includes Q
  8686  limbs remaining from the trial quotient division).  The starting trial
  8687  quotient can be 1 or 2 too big, but all cases of 2 too big and most cases of 1
  8688  too big are detected by first comparing the most significant limbs that will
  8689  arise from the subtraction.  An addback is done if the quotient still turns
  8690  out to be 1 too big.
  8691  
  8692  This whole procedure is essentially the same as one step of the basecase
  8693  algorithm done in a Q limb base, though with the trial quotient test done only
  8694  with the high limbs, not an entire Q limb ``digit'' product.  The correctness
  8695  of this weaker test can be established by following the argument of Knuth
  8696  section 4.3.1 exercise 20 but with the @m{v_2 \GMPhat q > b \GMPhat r
  8697  + u_2, v2*q>b*r+u2} condition appropriately relaxed.
  8698  
  8699  
  8700  @need 1000
  8701  @node Greatest Common Divisor Algorithms, Powering Algorithms, Division Algorithms, Algorithms
  8702  @section Greatest Common Divisor
  8703  @cindex Greatest common divisor algorithms
  8704  @cindex GCD algorithms
  8705  
  8706  @menu
  8707  * Binary GCD::
  8708  * Lehmer's Algorithm::
  8709  * Subquadratic GCD::
  8710  * Extended GCD::
  8711  * Jacobi Symbol::
  8712  @end menu
  8713  
  8714  
  8715  @node Binary GCD, Lehmer's Algorithm, Greatest Common Divisor Algorithms, Greatest Common Divisor Algorithms
  8716  @subsection Binary GCD
  8717  
  8718  At small sizes GMP uses an @math{O(N^2)} binary style GCD@.  This is described
  8719  in many textbooks, for example Knuth section 4.5.2 algorithm B@.  It simply
  8720  consists of successively reducing odd operands @math{a} and @math{b} using
  8721  
  8722  @quotation
  8723  @math{a,b = @abs{}(a-b),@min{}(a,b)} @*
  8724  strip factors of 2 from @math{a}
  8725  @end quotation
  8726  
  8727  The Euclidean GCD algorithm, as per Knuth algorithms E and A, repeatedly
  8728  computes the quotient @m{q = \lfloor a/b \rfloor, q = floor(a/b)} and replaces
  8729  @math{a,b} by @math{v, u - q v}. The binary algorithm has so far been found to
  8730  be faster than the Euclidean algorithm everywhere.  One reason the binary
  8731  method does well is that the implied quotient at each step is usually small,
  8732  so often only one or two subtractions are needed to get the same effect as a
  8733  division.  Quotients 1, 2 and 3 for example occur 67.7% of the time, see Knuth
  8734  section 4.5.3 Theorem E.
  8735  
  8736  When the implied quotient is large, meaning @math{b} is much smaller than
  8737  @math{a}, then a division is worthwhile.  This is the basis for the initial
  8738  @math{a @bmod b} reductions in @code{mpn_gcd} and @code{mpn_gcd_1} (the latter
  8739  for both N@cross{}1 and 1@cross{}1 cases).  But after that initial reduction,
  8740  big quotients occur too rarely to make it worth checking for them.
  8741  
  8742  @sp 1
  8743  The final @math{1@cross{}1} GCD in @code{mpn_gcd_1} is done in the generic C
  8744  code as described above.  For two N-bit operands, the algorithm takes about
  8745  0.68 iterations per bit.  For optimum performance some attention needs to be
  8746  paid to the way the factors of 2 are stripped from @math{a}.
  8747  
  8748  Firstly it may be noted that in twos complement the number of low zero bits on
  8749  @math{a-b} is the same as @math{b-a}, so counting or testing can begin on
  8750  @math{a-b} without waiting for @math{@abs{}(a-b)} to be determined.
  8751  
  8752  A loop stripping low zero bits tends not to branch predict well, since the
  8753  condition is data dependent.  But on average there's only a few low zeros, so
  8754  an option is to strip one or two bits arithmetically then loop for more (as
  8755  done for AMD K6).  Or use a lookup table to get a count for several bits then
  8756  loop for more (as done for AMD K7).  An alternative approach is to keep just
  8757  one of @math{a} or @math{b} odd and iterate
  8758  
  8759  @quotation
  8760  @math{a,b = @abs{}(a-b), @min{}(a,b)} @*
  8761  @math{a = a/2} if even @*
  8762  @math{b = b/2} if even
  8763  @end quotation
  8764  
  8765  This requires about 1.25 iterations per bit, but stripping of a single bit at
  8766  each step avoids any branching.  Repeating the bit strip reduces to about 0.9
  8767  iterations per bit, which may be a worthwhile tradeoff.
  8768  
  8769  Generally with the above approaches a speed of perhaps 6 cycles per bit can be
  8770  achieved, which is still not terribly fast with for instance a 64-bit GCD
  8771  taking nearly 400 cycles.  It's this sort of time which means it's not usually
  8772  advantageous to combine a set of divisibility tests into a GCD.
  8773  
  8774  Currently, the binary algorithm is used for GCD only when @math{N < 3}.
  8775  
  8776  @node Lehmer's Algorithm, Subquadratic GCD, Binary GCD, Greatest Common Divisor Algorithms
  8777  @comment  node-name,  next,  previous,  up
  8778  @subsection Lehmer's algorithm
  8779  
  8780  Lehmer's improvement of the Euclidean algorithms is based on the observation
  8781  that the initial part of the quotient sequence depends only on the most
  8782  significant parts of the inputs. The variant of Lehmer's algorithm used in GMP
  8783  splits off the most significant two limbs, as suggested, e.g., in ``A
  8784  Double-Digit Lehmer-Euclid Algorithm'' by Jebelean (@pxref{References}). The
  8785  quotients of two double-limb inputs are collected as a 2 by 2 matrix with
  8786  single-limb elements. This is done by the function @code{mpn_hgcd2}. The
  8787  resulting matrix is applied to the inputs using @code{mpn_mul_1} and
  8788  @code{mpn_submul_1}. Each iteration usually reduces the inputs by almost one
  8789  limb. In the rare case of a large quotient, no progress can be made by
  8790  examining just the most significant two limbs, and the quotient is computed
  8791  using plain division.
  8792  
  8793  The resulting algorithm is asymptotically @math{O(N^2)}, just as the Euclidean
  8794  algorithm and the binary algorithm. The quadratic part of the work are
  8795  the calls to @code{mpn_mul_1} and @code{mpn_submul_1}. For small sizes, the
  8796  linear work is also significant. There are roughly @math{N} calls to the
  8797  @code{mpn_hgcd2} function. This function uses a couple of important
  8798  optimizations:
  8799  
  8800  @itemize
  8801  @item
  8802  It uses the same relaxed notion of correctness as @code{mpn_hgcd} (see next
  8803  section). This means that when called with the most significant two limbs of
  8804  two large numbers, the returned matrix does not always correspond exactly to
  8805  the initial quotient sequence for the two large numbers; the final quotient
  8806  may sometimes be one off.
  8807  
  8808  @item
  8809  It takes advantage of the fact the quotients are usually small. The division
  8810  operator is not used, since the corresponding assembler instruction is very
  8811  slow on most architectures. (This code could probably be improved further, it
  8812  uses many branches that are unfriendly to prediction).
  8813  
  8814  @item
  8815  It switches from double-limb calculations to single-limb calculations half-way
  8816  through, when the input numbers have been reduced in size from two limbs to
  8817  one and a half.
  8818  
  8819  @end itemize
  8820  
  8821  @node Subquadratic GCD, Extended GCD, Lehmer's Algorithm, Greatest Common Divisor Algorithms
  8822  @subsection Subquadratic GCD
  8823  
  8824  For inputs larger than @code{GCD_DC_THRESHOLD}, GCD is computed via the HGCD
  8825  (Half GCD) function, as a generalization to Lehmer's algorithm.
  8826  
  8827  Let the inputs @math{a,b} be of size @math{N} limbs each. Put @m{S=\lfloor N/2
  8828  \rfloor + 1, S = floor(N/2) + 1}. Then HGCD(a,b) returns a transformation
  8829  matrix @math{T} with non-negative elements, and reduced numbers @math{(c;d) =
  8830  T^{-1} (a;b)}. The reduced numbers @math{c,d} must be larger than @math{S}
  8831  limbs, while their difference @math{abs(c-d)} must fit in @math{S} limbs. The
  8832  matrix elements will also be of size roughly @math{N/2}.
  8833  
  8834  The HGCD base case uses Lehmer's algorithm, but with the above stop condition
  8835  that returns reduced numbers and the corresponding transformation matrix
  8836  half-way through. For inputs larger than @code{HGCD_THRESHOLD}, HGCD is
  8837  computed recursively, using the divide and conquer algorithm in ``On
  8838  Sch@"onhage's algorithm and subquadratic integer GCD computation'' by M@"oller
  8839  (@pxref{References}). The recursive algorithm consists of these main
  8840  steps.
  8841  
  8842  @itemize
  8843  
  8844  @item
  8845  Call HGCD recursively, on the most significant @math{N/2} limbs. Apply the
  8846  resulting matrix @math{T_1} to the full numbers, reducing them to a size just
  8847  above @math{3N/2}.
  8848  
  8849  @item
  8850  Perform a small number of division or subtraction steps to reduce the numbers
  8851  to size below @math{3N/2}. This is essential mainly for the unlikely case of
  8852  large quotients.
  8853  
  8854  @item
  8855  Call HGCD recursively, on the most significant @math{N/2} limbs of the reduced
  8856  numbers. Apply the resulting matrix @math{T_2} to the full numbers, reducing
  8857  them to a size just above @math{N/2}.
  8858  
  8859  @item
  8860  Compute @math{T = T_1 T_2}.
  8861  
  8862  @item
  8863  Perform a small number of division and subtraction steps to satisfy the
  8864  requirements, and return.
  8865  @end itemize
  8866  
  8867  GCD is then implemented as a loop around HGCD, similarly to Lehmer's
  8868  algorithm. Where Lehmer repeatedly chops off the top two limbs, calls
  8869  @code{mpn_hgcd2}, and applies the resulting matrix to the full numbers, the
  8870  sub-quadratic GCD chops off the most significant third of the limbs (the
  8871  proportion is a tuning parameter, and @math{1/3} seems to be more efficient
  8872  than, e.g, @math{1/2}), calls @code{mpn_hgcd}, and applies the resulting
  8873  matrix. Once the input numbers are reduced to size below
  8874  @code{GCD_DC_THRESHOLD}, Lehmer's algorithm is used for the rest of the work.
  8875  
  8876  The asymptotic running time of both HGCD and GCD is @m{O(M(N)\log N),O(M(N)*log(N))},
  8877  where @math{M(N)} is the time for multiplying two @math{N}-limb numbers.
  8878  
  8879  @comment  node-name,  next,  previous,  up
  8880  
  8881  @node Extended GCD, Jacobi Symbol, Subquadratic GCD, Greatest Common Divisor Algorithms
  8882  @subsection Extended GCD
  8883  
  8884  The extended GCD function, or GCDEXT, calculates @math{@gcd{}(a,b)} and also
  8885  cofactors @math{x} and @math{y} satisfying @m{ax+by=\gcd(a@C{}b),
  8886  a*x+b*y=gcd(a@C{}b)}. All the algorithms used for plain GCD are extended to
  8887  handle this case. The binary algorithm is used only for single-limb GCDEXT.
  8888  Lehmer's algorithm is used for sizes up to @code{GCDEXT_DC_THRESHOLD}. Above
  8889  this threshold, GCDEXT is implemented as a loop around HGCD, but with more
  8890  book-keeping to keep track of the cofactors. This gives the same asymptotic
  8891  running time as for GCD and HGCD, @m{O(M(N)\log N),O(M(N)*log(N))}
  8892  
  8893  One difference to plain GCD is that while the inputs @math{a} and @math{b} are
  8894  reduced as the algorithm proceeds, the cofactors @math{x} and @math{y} grow in
  8895  size. This makes the tuning of the chopping-point more difficult. The current
  8896  code chops off the most significant half of the inputs for the call to HGCD in
  8897  the first iteration, and the most significant two thirds for the remaining
  8898  calls. This strategy could surely be improved. Also the stop condition for the
  8899  loop, where Lehmer's algorithm is invoked once the inputs are reduced below
  8900  @code{GCDEXT_DC_THRESHOLD}, could maybe be improved by taking into account the
  8901  current size of the cofactors.
  8902  
  8903  @node Jacobi Symbol,  , Extended GCD, Greatest Common Divisor Algorithms
  8904  @subsection Jacobi Symbol
  8905  @cindex Jacobi symbol algorithm
  8906  
  8907  [This section is obsolete.  The current Jacobi code actually uses a very
  8908  efficient algorithm.]
  8909  
  8910  @code{mpz_jacobi} and @code{mpz_kronecker} are currently implemented with a
  8911  simple binary algorithm similar to that described for the GCDs (@pxref{Binary
  8912  GCD}).  They're not very fast when both inputs are large.  Lehmer's multi-step
  8913  improvement or a binary based multi-step algorithm is likely to be better.
  8914  
  8915  When one operand fits a single limb, and that includes @code{mpz_kronecker_ui}
  8916  and friends, an initial reduction is done with either @code{mpn_mod_1} or
  8917  @code{mpn_modexact_1_odd}, followed by the binary algorithm on a single limb.
  8918  The binary algorithm is well suited to a single limb, and the whole
  8919  calculation in this case is quite efficient.
  8920  
  8921  In all the routines sign changes for the result are accumulated using some bit
  8922  twiddling, avoiding table lookups or conditional jumps.
  8923  
  8924  
  8925  @need 1000
  8926  @node Powering Algorithms, Root Extraction Algorithms, Greatest Common Divisor Algorithms, Algorithms
  8927  @section Powering Algorithms
  8928  @cindex Powering algorithms
  8929  
  8930  @menu
  8931  * Normal Powering Algorithm::
  8932  * Modular Powering Algorithm::
  8933  @end menu
  8934  
  8935  
  8936  @node Normal Powering Algorithm, Modular Powering Algorithm, Powering Algorithms, Powering Algorithms
  8937  @subsection Normal Powering
  8938  
  8939  Normal @code{mpz} or @code{mpf} powering uses a simple binary algorithm,
  8940  successively squaring and then multiplying by the base when a 1 bit is seen in
  8941  the exponent, as per Knuth section 4.6.3.  The ``left to right''
  8942  variant described there is used rather than algorithm A, since it's just as
  8943  easy and can be done with somewhat less temporary memory.
  8944  
  8945  
  8946  @node Modular Powering Algorithm,  , Normal Powering Algorithm, Powering Algorithms
  8947  @subsection Modular Powering
  8948  
  8949  Modular powering is implemented using a @math{2^k}-ary sliding window
  8950  algorithm, as per ``Handbook of Applied Cryptography'' algorithm 14.85
  8951  (@pxref{References}).  @math{k} is chosen according to the size of the
  8952  exponent.  Larger exponents use larger values of @math{k}, the choice being
  8953  made to minimize the average number of multiplications that must supplement
  8954  the squaring.
  8955  
  8956  The modular multiplies and squarings use either a simple division or the REDC
  8957  method by Montgomery (@pxref{References}).  REDC is a little faster,
  8958  essentially saving N single limb divisions in a fashion similar to an exact
  8959  remainder (@pxref{Exact Remainder}).
  8960  
  8961  
  8962  @node Root Extraction Algorithms, Radix Conversion Algorithms, Powering Algorithms, Algorithms
  8963  @section Root Extraction Algorithms
  8964  @cindex Root extraction algorithms
  8965  
  8966  @menu
  8967  * Square Root Algorithm::
  8968  * Nth Root Algorithm::
  8969  * Perfect Square Algorithm::
  8970  * Perfect Power Algorithm::
  8971  @end menu
  8972  
  8973  
  8974  @node Square Root Algorithm, Nth Root Algorithm, Root Extraction Algorithms, Root Extraction Algorithms
  8975  @subsection Square Root
  8976  @cindex Square root algorithm
  8977  @cindex Karatsuba square root algorithm
  8978  
  8979  Square roots are taken using the ``Karatsuba Square Root'' algorithm by Paul
  8980  Zimmermann (@pxref{References}).
  8981  
  8982  An input @math{n} is split into four parts of @math{k} bits each, so with
  8983  @math{b=2^k} we have @m{n = a_3b^3 + a_2b^2 + a_1b + a_0, n = a3*b^3 + a2*b^2
  8984  + a1*b + a0}.  Part @ms{a,3} must be ``normalized'' so that either the high or
  8985  second highest bit is set.  In GMP, @math{k} is kept on a limb boundary and
  8986  the input is left shifted (by an even number of bits) to normalize.
  8987  
  8988  The square root of the high two parts is taken, by recursive application of
  8989  the algorithm (bottoming out in a one-limb Newton's method),
  8990  @tex
  8991  $$ s',r' = \mathop{\rm sqrtrem} \> (a_3b + a_2) $$
  8992  @end tex
  8993  @ifnottex
  8994  
  8995  @example
  8996  s1,r1 = sqrtrem (a3*b + a2)
  8997  @end example
  8998  
  8999  @end ifnottex
  9000  This is an approximation to the desired root and is extended by a division to
  9001  give @math{s},@math{r},
  9002  @tex
  9003  $$\eqalign{
  9004  q,u &= \mathop{\rm divrem} \> (r'b + a_1, 2s') \cr
  9005  s &= s'b + q \cr
  9006  r &= ub + a_0 - q^2
  9007  }$$
  9008  @end tex
  9009  @ifnottex
  9010  
  9011  @example
  9012  q,u = divrem (r1*b + a1, 2*s1)
  9013  s = s1*b + q
  9014  r = u*b + a0 - q^2
  9015  @end example
  9016  
  9017  @end ifnottex
  9018  The normalization requirement on @ms{a,3} means at this point @math{s} is
  9019  either correct or 1 too big.  @math{r} is negative in the latter case, so
  9020  @tex
  9021  $$\eqalign{
  9022  \mathop{\rm if} \; r &< 0 \; \mathop{\rm then} \cr
  9023  r &\leftarrow r + 2s - 1 \cr
  9024  s &\leftarrow s - 1
  9025  }$$
  9026  @end tex
  9027  @ifnottex
  9028  
  9029  @example
  9030  if r < 0 then
  9031    r = r + 2*s - 1
  9032    s = s - 1
  9033  @end example
  9034  
  9035  @end ifnottex
  9036  The algorithm is expressed in a divide and conquer form, but as noted in the
  9037  paper it can also be viewed as a discrete variant of Newton's method, or as a
  9038  variation on the schoolboy method (no longer taught) for square roots two
  9039  digits at a time.
  9040  
  9041  If the remainder @math{r} is not required then usually only a few high limbs
  9042  of @math{r} and @math{u} need to be calculated to determine whether an
  9043  adjustment to @math{s} is required.  This optimization is not currently
  9044  implemented.
  9045  
  9046  In the Karatsuba multiplication range this algorithm is @m{O({3\over2}
  9047  M(N/2)),O(1.5*M(N/2))}, where @math{M(n)} is the time to multiply two numbers
  9048  of @math{n} limbs.  In the FFT multiplication range this grows to a bound of
  9049  @m{O(6 M(N/2)),O(6*M(N/2))}.  In practice a factor of about 1.5 to 1.8 is
  9050  found in the Karatsuba and Toom-3 ranges, growing to 2 or 3 in the FFT range.
  9051  
  9052  The algorithm does all its calculations in integers and the resulting
  9053  @code{mpn_sqrtrem} is used for both @code{mpz_sqrt} and @code{mpf_sqrt}.
  9054  The extended precision given by @code{mpf_sqrt_ui} is obtained by
  9055  padding with zero limbs.
  9056  
  9057  
  9058  @node Nth Root Algorithm, Perfect Square Algorithm, Square Root Algorithm, Root Extraction Algorithms
  9059  @subsection Nth Root
  9060  @cindex Root extraction algorithm
  9061  @cindex Nth root algorithm
  9062  
  9063  Integer Nth roots are taken using Newton's method with the following
  9064  iteration, where @math{A} is the input and @math{n} is the root to be taken.
  9065  @tex
  9066  $$a_{i+1} = {1\over n} \left({A \over a_i^{n-1}} + (n-1)a_i \right)$$
  9067  @end tex
  9068  @ifnottex
  9069  
  9070  @example
  9071           1         A
  9072  a[i+1] = - * ( --------- + (n-1)*a[i] )
  9073           n     a[i]^(n-1)
  9074  @end example
  9075  
  9076  @end ifnottex
  9077  The initial approximation @m{a_1,a[1]} is generated bitwise by successively
  9078  powering a trial root with or without new 1 bits, aiming to be just above the
  9079  true root.  The iteration converges quadratically when started from a good
  9080  approximation.  When @math{n} is large more initial bits are needed to get
  9081  good convergence.  The current implementation is not particularly well
  9082  optimized.
  9083  
  9084  
  9085  @node Perfect Square Algorithm, Perfect Power Algorithm, Nth Root Algorithm, Root Extraction Algorithms
  9086  @subsection Perfect Square
  9087  @cindex Perfect square algorithm
  9088  
  9089  A significant fraction of non-squares can be quickly identified by checking
  9090  whether the input is a quadratic residue modulo small integers.
  9091  
  9092  @code{mpz_perfect_square_p} first tests the input mod 256, which means just
  9093  examining the low byte.  Only 44 different values occur for squares mod 256,
  9094  so 82.8% of inputs can be immediately identified as non-squares.
  9095  
  9096  On a 32-bit system similar tests are done mod 9, 5, 7, 13 and 17, for a total
  9097  99.25% of inputs identified as non-squares.  On a 64-bit system 97 is tested
  9098  too, for a total 99.62%.
  9099  
  9100  These moduli are chosen because they're factors of @math{2^@W{24}-1} (or
  9101  @math{2^@W{48}-1} for 64-bits), and such a remainder can be quickly taken just
  9102  using additions (see @code{mpn_mod_34lsub1}).
  9103  
  9104  When nails are in use moduli are instead selected by the @file{gen-psqr.c}
  9105  program and applied with an @code{mpn_mod_1}.  The same @math{2^@W{24}-1} or
  9106  @math{2^@W{48}-1} could be done with nails using some extra bit shifts, but
  9107  this is not currently implemented.
  9108  
  9109  In any case each modulus is applied to the @code{mpn_mod_34lsub1} or
  9110  @code{mpn_mod_1} remainder and a table lookup identifies non-squares.  By
  9111  using a ``modexact'' style calculation, and suitably permuted tables, just one
  9112  multiply each is required, see the code for details.  Moduli are also combined
  9113  to save operations, so long as the lookup tables don't become too big.
  9114  @file{gen-psqr.c} does all the pre-calculations.
  9115  
  9116  A square root must still be taken for any value that passes these tests, to
  9117  verify it's really a square and not one of the small fraction of non-squares
  9118  that get through (i.e.@: a pseudo-square to all the tested bases).
  9119  
  9120  Clearly more residue tests could be done, @code{mpz_perfect_square_p} only
  9121  uses a compact and efficient set.  Big inputs would probably benefit from more
  9122  residue testing, small inputs might be better off with less.  The assumed
  9123  distribution of squares versus non-squares in the input would affect such
  9124  considerations.
  9125  
  9126  
  9127  @node Perfect Power Algorithm,  , Perfect Square Algorithm, Root Extraction Algorithms
  9128  @subsection Perfect Power
  9129  @cindex Perfect power algorithm
  9130  
  9131  Detecting perfect powers is required by some factorization algorithms.
  9132  Currently @code{mpz_perfect_power_p} is implemented using repeated Nth root
  9133  extractions, though naturally only prime roots need to be considered.
  9134  (@xref{Nth Root Algorithm}.)
  9135  
  9136  If a prime divisor @math{p} with multiplicity @math{e} can be found, then only
  9137  roots which are divisors of @math{e} need to be considered, much reducing the
  9138  work necessary.  To this end divisibility by a set of small primes is checked.
  9139  
  9140  
  9141  @node Radix Conversion Algorithms, Other Algorithms, Root Extraction Algorithms, Algorithms
  9142  @section Radix Conversion
  9143  @cindex Radix conversion algorithms
  9144  
  9145  Radix conversions are less important than other algorithms.  A program
  9146  dominated by conversions should probably use a different data representation.
  9147  
  9148  @menu
  9149  * Binary to Radix::
  9150  * Radix to Binary::
  9151  @end menu
  9152  
  9153  
  9154  @node Binary to Radix, Radix to Binary, Radix Conversion Algorithms, Radix Conversion Algorithms
  9155  @subsection Binary to Radix
  9156  
  9157  Conversions from binary to a power-of-2 radix use a simple and fast
  9158  @math{O(N)} bit extraction algorithm.
  9159  
  9160  Conversions from binary to other radices use one of two algorithms.  Sizes
  9161  below @code{GET_STR_PRECOMPUTE_THRESHOLD} use a basic @math{O(N^2)} method.
  9162  Repeated divisions by @math{b^n} are made, where @math{b} is the radix and
  9163  @math{n} is the biggest power that fits in a limb.  But instead of simply
  9164  using the remainder @math{r} from such divisions, an extra divide step is done
  9165  to give a fractional limb representing @math{r/b^n}.  The digits of @math{r}
  9166  can then be extracted using multiplications by @math{b} rather than divisions.
  9167  Special case code is provided for decimal, allowing multiplications by 10 to
  9168  optimize to shifts and adds.
  9169  
  9170  Above @code{GET_STR_PRECOMPUTE_THRESHOLD} a sub-quadratic algorithm is used.
  9171  For an input @math{t}, powers @m{b^{n2^i},b^(n*2^i)} of the radix are
  9172  calculated, until a power between @math{t} and @m{\sqrt{t},sqrt(t)} is
  9173  reached.  @math{t} is then divided by that largest power, giving a quotient
  9174  which is the digits above that power, and a remainder which is those below.
  9175  These two parts are in turn divided by the second highest power, and so on
  9176  recursively.  When a piece has been divided down to less than
  9177  @code{GET_STR_DC_THRESHOLD} limbs, the basecase algorithm described above is
  9178  used.
  9179  
  9180  The advantage of this algorithm is that big divisions can make use of the
  9181  sub-quadratic divide and conquer division (@pxref{Divide and Conquer
  9182  Division}), and big divisions tend to have less overheads than lots of
  9183  separate single limb divisions anyway.  But in any case the cost of
  9184  calculating the powers @m{b^{n2^i},b^(n*2^i)} must first be overcome.
  9185  
  9186  @code{GET_STR_PRECOMPUTE_THRESHOLD} and @code{GET_STR_DC_THRESHOLD} represent
  9187  the same basic thing, the point where it becomes worth doing a big division to
  9188  cut the input in half.  @code{GET_STR_PRECOMPUTE_THRESHOLD} includes the cost
  9189  of calculating the radix power required, whereas @code{GET_STR_DC_THRESHOLD}
  9190  assumes that's already available, which is the case when recursing.
  9191  
  9192  Since the base case produces digits from least to most significant but they
  9193  want to be stored from most to least, it's necessary to calculate in advance
  9194  how many digits there will be, or at least be sure not to underestimate that.
  9195  For GMP the number of input bits is multiplied by @code{chars_per_bit_exactly}
  9196  from @code{mp_bases}, rounding up.  The result is either correct or one too
  9197  big.
  9198  
  9199  Examining some of the high bits of the input could increase the chance of
  9200  getting the exact number of digits, but an exact result every time would not
  9201  be practical, since in general the difference between numbers 100@dots{} and
  9202  99@dots{} is only in the last few bits and the work to identify 99@dots{}
  9203  might well be almost as much as a full conversion.
  9204  
  9205  The @math{r/b^n} scheme described above for using multiplications to bring out
  9206  digits might be useful for more than a single limb.  Some brief experiments
  9207  with it on the base case when recursing didn't give a noticeable improvement,
  9208  but perhaps that was only due to the implementation.  Something similar would
  9209  work for the sub-quadratic divisions too, though there would be the cost of
  9210  calculating a bigger radix power.
  9211  
  9212  Another possible improvement for the sub-quadratic part would be to arrange
  9213  for radix powers that balanced the sizes of quotient and remainder produced,
  9214  i.e.@: the highest power would be an @m{b^{nk},b^(n*k)} approximately equal to
  9215  @m{\sqrt{t},sqrt(t)}, not restricted to a @math{2^i} factor.  That ought to
  9216  smooth out a graph of times against sizes, but may or may not be a net
  9217  speedup.
  9218  
  9219  
  9220  @node Radix to Binary,  , Binary to Radix, Radix Conversion Algorithms
  9221  @subsection Radix to Binary
  9222  
  9223  @strong{This section needs to be rewritten, it currently describes the
  9224  algorithms used before GMP 4.3.}
  9225  
  9226  Conversions from a power-of-2 radix into binary use a simple and fast
  9227  @math{O(N)} bitwise concatenation algorithm.
  9228  
  9229  Conversions from other radices use one of two algorithms.  Sizes below
  9230  @code{SET_STR_PRECOMPUTE_THRESHOLD} use a basic @math{O(N^2)} method.  Groups
  9231  of @math{n} digits are converted to limbs, where @math{n} is the biggest
  9232  power of the base @math{b} which will fit in a limb, then those groups are
  9233  accumulated into the result by multiplying by @math{b^n} and adding.  This
  9234  saves multi-precision operations, as per Knuth section 4.4 part E
  9235  (@pxref{References}).  Some special case code is provided for decimal, giving
  9236  the compiler a chance to optimize multiplications by 10.
  9237  
  9238  Above @code{SET_STR_PRECOMPUTE_THRESHOLD} a sub-quadratic algorithm is used.
  9239  First groups of @math{n} digits are converted into limbs.  Then adjacent
  9240  limbs are combined into limb pairs with @m{xb^n+y,x*b^n+y}, where @math{x}
  9241  and @math{y} are the limbs.  Adjacent limb pairs are combined into quads
  9242  similarly with @m{xb^{2n}+y,x*b^(2n)+y}.  This continues until a single block
  9243  remains, that being the result.
  9244  
  9245  The advantage of this method is that the multiplications for each @math{x} are
  9246  big blocks, allowing Karatsuba and higher algorithms to be used.  But the cost
  9247  of calculating the powers @m{b^{n2^i},b^(n*2^i)} must be overcome.
  9248  @code{SET_STR_PRECOMPUTE_THRESHOLD} usually ends up quite big, around 5000 digits, and on
  9249  some processors much bigger still.
  9250  
  9251  @code{SET_STR_PRECOMPUTE_THRESHOLD} is based on the input digits (and tuned
  9252  for decimal), though it might be better based on a limb count, so as to be
  9253  independent of the base.  But that sort of count isn't used by the base case
  9254  and so would need some sort of initial calculation or estimate.
  9255  
  9256  The main reason @code{SET_STR_PRECOMPUTE_THRESHOLD} is so much bigger than the
  9257  corresponding @code{GET_STR_PRECOMPUTE_THRESHOLD} is that @code{mpn_mul_1} is
  9258  much faster than @code{mpn_divrem_1} (often by a factor of 5, or more).
  9259  
  9260  
  9261  @need 1000
  9262  @node Other Algorithms, Assembly Coding, Radix Conversion Algorithms, Algorithms
  9263  @section Other Algorithms
  9264  
  9265  @menu
  9266  * Prime Testing Algorithm::
  9267  * Factorial Algorithm::
  9268  * Binomial Coefficients Algorithm::
  9269  * Fibonacci Numbers Algorithm::
  9270  * Lucas Numbers Algorithm::
  9271  * Random Number Algorithms::
  9272  @end menu
  9273  
  9274  
  9275  @node Prime Testing Algorithm, Factorial Algorithm, Other Algorithms, Other Algorithms
  9276  @subsection Prime Testing
  9277  @cindex Prime testing algorithms
  9278  
  9279  The primality testing in @code{mpz_probab_prime_p} (@pxref{Number Theoretic
  9280  Functions}) first does some trial division by small factors and then uses the
  9281  Miller-Rabin probabilistic primality testing algorithm, as described in Knuth
  9282  section 4.5.4 algorithm P (@pxref{References}).
  9283  
  9284  For an odd input @math{n}, and with @math{n = q@GMPmultiply{}2^k+1} where
  9285  @math{q} is odd, this algorithm selects a random base @math{x} and tests
  9286  whether @math{x^q @bmod{} n} is 1 or @math{-1}, or an @m{x^{q2^j} \bmod n,
  9287  x^(q*2^j) mod n} is @math{1}, for @math{1@le{}j@le{}k}.  If so then @math{n}
  9288  is probably prime, if not then @math{n} is definitely composite.
  9289  
  9290  Any prime @math{n} will pass the test, but some composites do too.  Such
  9291  composites are known as strong pseudoprimes to base @math{x}.  No @math{n} is
  9292  a strong pseudoprime to more than @math{1/4} of all bases (see Knuth exercise
  9293  22), hence with @math{x} chosen at random there's no more than a @math{1/4}
  9294  chance a ``probable prime'' will in fact be composite.
  9295  
  9296  In fact strong pseudoprimes are quite rare, making the test much more
  9297  powerful than this analysis would suggest, but @math{1/4} is all that's proven
  9298  for an arbitrary @math{n}.
  9299  
  9300  
  9301  @node Factorial Algorithm, Binomial Coefficients Algorithm, Prime Testing Algorithm, Other Algorithms
  9302  @subsection Factorial
  9303  @cindex Factorial algorithm
  9304  
  9305  Factorials are calculated by a combination of two algorithms. An idea is
  9306  shared among them: to compute the odd part of the factorial; a final step
  9307  takes account of the power of @math{2} term, by shifting.
  9308  
  9309  For small @math{n}, the odd factor of @math{n!} is computed with the simple
  9310  observation that it is equal to the product of all positive odd numbers
  9311  smaller than @math{n} times the odd factor of @m{\lfloor n/2\rfloor!, [n/2]!},
  9312  where @m{\lfloor x\rfloor, [x]} is the integer part of @math{x}, and so on
  9313  recursively. The procedure can be best illustrated with an example,
  9314  
  9315  @quotation
  9316  @math{23! = (23.21.19.17.15.13.11.9.7.5.3)(11.9.7.5.3)(5.3)2^{19}}
  9317  @end quotation
  9318  
  9319  Current code collects all the factors in a single list, with a loop and no
  9320  recursion, and compute the product, with no special care for repeated chunks.
  9321  
  9322  When @math{n} is larger, computation pass trough prime sieving. An helper
  9323  function is used, as suggested by Peter Luschny:
  9324  @tex
  9325  $$\mathop{\rm msf}(n) = {n!\over\lfloor n/2\rfloor!^2\cdot2^k} = \prod_{p=3}^{n}
  9326  p^{\mathop{\rm L}(p,n)} $$
  9327  @end tex
  9328  @ifnottex
  9329  
  9330  @example
  9331                              n
  9332                            -----
  9333                 n!          | |   L(p,n)
  9334  msf(n) = -------------- =  | |  p
  9335            [n/2]!^2.2^k     p=3
  9336  @end example
  9337  @end ifnottex
  9338  
  9339  Where @math{p} ranges on odd prime numbers. The exponent @math{k} is chosen to
  9340  obtain an odd integer number: @math{k} is the number of 1 bits in the binary
  9341  representation of @m{\lfloor n/2\rfloor, [n/2]}. The function L@math{(p,n)}
  9342  can be defined as zero when @math{p} is composite, and, for any prime
  9343  @math{p}, it is computed with:
  9344  @tex
  9345  $$\mathop{\rm L}(p,n) = \sum_{i>0}\left\lfloor{n\over p^i}\right\rfloor\bmod2
  9346  \leq\log_p(n)$$
  9347  @end tex
  9348  @ifnottex
  9349  
  9350  @example
  9351            ---
  9352             \    n
  9353  L(p,n) =   /  [---] mod 2   <=  log (n) .
  9354            ---  p^i                p
  9355            i>0
  9356  @end example
  9357  @end ifnottex
  9358  
  9359  With this helper function, we are able to compute the odd part of @math{n!}
  9360  using the recursion implied by @m{n!=\lfloor n/2\rfloor!^2\cdot\mathop{\rm
  9361  msf}(n)\cdot2^k , n!=[n/2]!^2*msf(n)*2^k}. The recursion stops using the
  9362  small-@math{n} algorithm on some @m{\lfloor n/2^i\rfloor, [n/2^i]}.
  9363  
  9364  Both the above algorithms use binary splitting to compute the product of many
  9365  small factors. At first as many products as possible are accumulated in a
  9366  single register, generating a list of factors that fit in a machine word. This
  9367  list is then split into halves, and the product is computed recursively.
  9368  
  9369  Such splitting is more efficient than repeated N@cross{}1 multiplies since it
  9370  forms big multiplies, allowing Karatsuba and higher algorithms to be used.
  9371  And even below the Karatsuba threshold a big block of work can be more
  9372  efficient for the basecase algorithm.
  9373  
  9374  
  9375  @node Binomial Coefficients Algorithm, Fibonacci Numbers Algorithm, Factorial Algorithm, Other Algorithms
  9376  @subsection Binomial Coefficients
  9377  @cindex Binomial coefficient algorithm
  9378  
  9379  Binomial coefficients @m{\left({n}\atop{k}\right), C(n@C{}k)} are calculated
  9380  by first arranging @math{k @le{} n/2} using @m{\left({n}\atop{k}\right) =
  9381  \left({n}\atop{n-k}\right), C(n@C{}k) = C(n@C{}n-k)} if necessary, and then
  9382  evaluating the following product simply from @math{i=2} to @math{i=k}.
  9383  @tex
  9384  $$ \left({n}\atop{k}\right) = (n-k+1) \prod_{i=2}^{k} {{n-k+i} \over i} $$
  9385  @end tex
  9386  @ifnottex
  9387  
  9388  @example
  9389                        k  (n-k+i)
  9390  C(n,k) =  (n-k+1) * prod -------
  9391                       i=2    i
  9392  @end example
  9393  
  9394  @end ifnottex
  9395  It's easy to show that each denominator @math{i} will divide the product so
  9396  far, so the exact division algorithm is used (@pxref{Exact Division}).
  9397  
  9398  The numerators @math{n-k+i} and denominators @math{i} are first accumulated
  9399  into as many fit a limb, to save multi-precision operations, though for
  9400  @code{mpz_bin_ui} this applies only to the divisors, since @math{n} is an
  9401  @code{mpz_t} and @math{n-k+i} in general won't fit in a limb at all.
  9402  
  9403  
  9404  @node Fibonacci Numbers Algorithm, Lucas Numbers Algorithm, Binomial Coefficients Algorithm, Other Algorithms
  9405  @subsection Fibonacci Numbers
  9406  @cindex Fibonacci number algorithm
  9407  
  9408  The Fibonacci functions @code{mpz_fib_ui} and @code{mpz_fib2_ui} are designed
  9409  for calculating isolated @m{F_n,F[n]} or @m{F_n,F[n]},@m{F_{n-1},F[n-1]}
  9410  values efficiently.
  9411  
  9412  For small @math{n}, a table of single limb values in @code{__gmp_fib_table} is
  9413  used.  On a 32-bit limb this goes up to @m{F_{47},F[47]}, or on a 64-bit limb
  9414  up to @m{F_{93},F[93]}.  For convenience the table starts at @m{F_{-1},F[-1]}.
  9415  
  9416  Beyond the table, values are generated with a binary powering algorithm,
  9417  calculating a pair @m{F_n,F[n]} and @m{F_{n-1},F[n-1]} working from high to
  9418  low across the bits of @math{n}.  The formulas used are
  9419  @tex
  9420  $$\eqalign{
  9421    F_{2k+1} &= 4F_k^2 - F_{k-1}^2 + 2(-1)^k \cr
  9422    F_{2k-1} &=  F_k^2 + F_{k-1}^2           \cr
  9423    F_{2k}   &= F_{2k+1} - F_{2k-1}
  9424  }$$
  9425  @end tex
  9426  @ifnottex
  9427  
  9428  @example
  9429  F[2k+1] = 4*F[k]^2 - F[k-1]^2 + 2*(-1)^k
  9430  F[2k-1] =   F[k]^2 + F[k-1]^2
  9431  
  9432  F[2k] = F[2k+1] - F[2k-1]
  9433  @end example
  9434  
  9435  @end ifnottex
  9436  At each step, @math{k} is the high @math{b} bits of @math{n}.  If the next bit
  9437  of @math{n} is 0 then @m{F_{2k},F[2k]},@m{F_{2k-1},F[2k-1]} is used, or if
  9438  it's a 1 then @m{F_{2k+1},F[2k+1]},@m{F_{2k},F[2k]} is used, and the process
  9439  repeated until all bits of @math{n} are incorporated.  Notice these formulas
  9440  require just two squares per bit of @math{n}.
  9441  
  9442  It'd be possible to handle the first few @math{n} above the single limb table
  9443  with simple additions, using the defining Fibonacci recurrence @m{F_{k+1} =
  9444  F_k + F_{k-1}, F[k+1]=F[k]+F[k-1]}, but this is not done since it usually
  9445  turns out to be faster for only about 10 or 20 values of @math{n}, and
  9446  including a block of code for just those doesn't seem worthwhile.  If they
  9447  really mattered it'd be better to extend the data table.
  9448  
  9449  Using a table avoids lots of calculations on small numbers, and makes small
  9450  @math{n} go fast.  A bigger table would make more small @math{n} go fast, it's
  9451  just a question of balancing size against desired speed.  For GMP the code is
  9452  kept compact, with the emphasis primarily on a good powering algorithm.
  9453  
  9454  @code{mpz_fib2_ui} returns both @m{F_n,F[n]} and @m{F_{n-1},F[n-1]}, but
  9455  @code{mpz_fib_ui} is only interested in @m{F_n,F[n]}.  In this case the last
  9456  step of the algorithm can become one multiply instead of two squares.  One of
  9457  the following two formulas is used, according as @math{n} is odd or even.
  9458  @tex
  9459  $$\eqalign{
  9460    F_{2k}   &= F_k (F_k + 2F_{k-1}) \cr
  9461    F_{2k+1} &= (2F_k + F_{k-1}) (2F_k - F_{k-1}) + 2(-1)^k
  9462  }$$
  9463  @end tex
  9464  @ifnottex
  9465  
  9466  @example
  9467  F[2k]   = F[k]*(F[k]+2F[k-1])
  9468  
  9469  F[2k+1] = (2F[k]+F[k-1])*(2F[k]-F[k-1]) + 2*(-1)^k
  9470  @end example
  9471  
  9472  @end ifnottex
  9473  @m{F_{2k+1},F[2k+1]} here is the same as above, just rearranged to be a
  9474  multiply.  For interest, the @m{2(-1)^k, 2*(-1)^k} term both here and above
  9475  can be applied just to the low limb of the calculation, without a carry or
  9476  borrow into further limbs, which saves some code size.  See comments with
  9477  @code{mpz_fib_ui} and the internal @code{mpn_fib2_ui} for how this is done.
  9478  
  9479  
  9480  @node Lucas Numbers Algorithm, Random Number Algorithms, Fibonacci Numbers Algorithm, Other Algorithms
  9481  @subsection Lucas Numbers
  9482  @cindex Lucas number algorithm
  9483  
  9484  @code{mpz_lucnum2_ui} derives a pair of Lucas numbers from a pair of Fibonacci
  9485  numbers with the following simple formulas.
  9486  @tex
  9487  $$\eqalign{
  9488    L_k     &=  F_k + 2F_{k-1} \cr
  9489    L_{k-1} &= 2F_k -  F_{k-1}
  9490  }$$
  9491  @end tex
  9492  @ifnottex
  9493  
  9494  @example
  9495  L[k]   =   F[k] + 2*F[k-1]
  9496  L[k-1] = 2*F[k] -   F[k-1]
  9497  @end example
  9498  
  9499  @end ifnottex
  9500  @code{mpz_lucnum_ui} is only interested in @m{L_n,L[n]}, and some work can be
  9501  saved.  Trailing zero bits on @math{n} can be handled with a single square
  9502  each.
  9503  @tex
  9504  $$ L_{2k} = L_k^2 - 2(-1)^k $$
  9505  @end tex
  9506  @ifnottex
  9507  
  9508  @example
  9509  L[2k] = L[k]^2 - 2*(-1)^k
  9510  @end example
  9511  
  9512  @end ifnottex
  9513  And the lowest 1 bit can be handled with one multiply of a pair of Fibonacci
  9514  numbers, similar to what @code{mpz_fib_ui} does.
  9515  @tex
  9516  $$ L_{2k+1} = 5F_{k-1} (2F_k + F_{k-1}) - 4(-1)^k $$
  9517  @end tex
  9518  @ifnottex
  9519  
  9520  @example
  9521  L[2k+1] = 5*F[k-1]*(2*F[k]+F[k-1]) - 4*(-1)^k
  9522  @end example
  9523  
  9524  @end ifnottex
  9525  
  9526  
  9527  @node Random Number Algorithms,  , Lucas Numbers Algorithm, Other Algorithms
  9528  @subsection Random Numbers
  9529  @cindex Random number algorithms
  9530  
  9531  For the @code{urandomb} functions, random numbers are generated simply by
  9532  concatenating bits produced by the generator.  As long as the generator has
  9533  good randomness properties this will produce well-distributed @math{N} bit
  9534  numbers.
  9535  
  9536  For the @code{urandomm} functions, random numbers in a range @math{0@le{}R<N}
  9537  are generated by taking values @math{R} of @m{\lceil \log_2 N \rceil,
  9538  ceil(log2(N))} bits each until one satisfies @math{R<N}.  This will normally
  9539  require only one or two attempts, but the attempts are limited in case the
  9540  generator is somehow degenerate and produces only 1 bits or similar.
  9541  
  9542  @cindex Mersenne twister algorithm
  9543  The Mersenne Twister generator is by Matsumoto and Nishimura
  9544  (@pxref{References}).  It has a non-repeating period of @math{2^@W{19937}-1},
  9545  which is a Mersenne prime, hence the name of the generator.  The state is 624
  9546  words of 32-bits each, which is iterated with one XOR and shift for each
  9547  32-bit word generated, making the algorithm very fast.  Randomness properties
  9548  are also very good and this is the default algorithm used by GMP.
  9549  
  9550  @cindex Linear congruential algorithm
  9551  Linear congruential generators are described in many text books, for instance
  9552  Knuth volume 2 (@pxref{References}).  With a modulus @math{M} and parameters
  9553  @math{A} and @math{C}, an integer state @math{S} is iterated by the formula
  9554  @math{S @leftarrow{} A@GMPmultiply{}S+C @bmod{} M}.  At each step the new
  9555  state is a linear function of the previous, mod @math{M}, hence the name of
  9556  the generator.
  9557  
  9558  In GMP only moduli of the form @math{2^N} are supported, and the current
  9559  implementation is not as well optimized as it could be.  Overheads are
  9560  significant when @math{N} is small, and when @math{N} is large clearly the
  9561  multiply at each step will become slow.  This is not a big concern, since the
  9562  Mersenne Twister generator is better in every respect and is therefore
  9563  recommended for all normal applications.
  9564  
  9565  For both generators the current state can be deduced by observing enough
  9566  output and applying some linear algebra (over GF(2) in the case of the
  9567  Mersenne Twister).  This generally means raw output is unsuitable for
  9568  cryptographic applications without further hashing or the like.
  9569  
  9570  
  9571  @node Assembly Coding,  , Other Algorithms, Algorithms
  9572  @section Assembly Coding
  9573  @cindex Assembly coding
  9574  
  9575  The assembly subroutines in GMP are the most significant source of speed at
  9576  small to moderate sizes.  At larger sizes algorithm selection becomes more
  9577  important, but of course speedups in low level routines will still speed up
  9578  everything proportionally.
  9579  
  9580  Carry handling and widening multiplies that are important for GMP can't be
  9581  easily expressed in C@.  GCC @code{asm} blocks help a lot and are provided in
  9582  @file{longlong.h}, but hand coding low level routines invariably offers a
  9583  speedup over generic C by a factor of anything from 2 to 10.
  9584  
  9585  @menu
  9586  * Assembly Code Organisation::
  9587  * Assembly Basics::
  9588  * Assembly Carry Propagation::
  9589  * Assembly Cache Handling::
  9590  * Assembly Functional Units::
  9591  * Assembly Floating Point::
  9592  * Assembly SIMD Instructions::
  9593  * Assembly Software Pipelining::
  9594  * Assembly Loop Unrolling::
  9595  * Assembly Writing Guide::
  9596  @end menu
  9597  
  9598  
  9599  @node Assembly Code Organisation, Assembly Basics, Assembly Coding, Assembly Coding
  9600  @subsection Code Organisation
  9601  @cindex Assembly code organisation
  9602  @cindex Code organisation
  9603  
  9604  The various @file{mpn} subdirectories contain machine-dependent code, written
  9605  in C or assembly.  The @file{mpn/generic} subdirectory contains default code,
  9606  used when there's no machine-specific version of a particular file.
  9607  
  9608  Each @file{mpn} subdirectory is for an ISA family.  Generally 32-bit and
  9609  64-bit variants in a family cannot share code and have separate directories.
  9610  Within a family further subdirectories may exist for CPU variants.
  9611  
  9612  In each directory a @file{nails} subdirectory may exist, holding code with
  9613  nails support for that CPU variant.  A @code{NAILS_SUPPORT} directive in each
  9614  file indicates the nails values the code handles.  Nails code only exists
  9615  where it's faster, or promises to be faster, than plain code.  There's no
  9616  effort put into nails if they're not going to enhance a given CPU.
  9617  
  9618  
  9619  @node Assembly Basics, Assembly Carry Propagation, Assembly Code Organisation, Assembly Coding
  9620  @subsection Assembly Basics
  9621  
  9622  @code{mpn_addmul_1} and @code{mpn_submul_1} are the most important routines
  9623  for overall GMP performance.  All multiplications and divisions come down to
  9624  repeated calls to these.  @code{mpn_add_n}, @code{mpn_sub_n},
  9625  @code{mpn_lshift} and @code{mpn_rshift} are next most important.
  9626  
  9627  On some CPUs assembly versions of the internal functions
  9628  @code{mpn_mul_basecase} and @code{mpn_sqr_basecase} give significant speedups,
  9629  mainly through avoiding function call overheads.  They can also potentially
  9630  make better use of a wide superscalar processor, as can bigger primitives like
  9631  @code{mpn_addmul_2} or @code{mpn_addmul_4}.
  9632  
  9633  The restrictions on overlaps between sources and destinations
  9634  (@pxref{Low-level Functions}) are designed to facilitate a variety of
  9635  implementations.  For example, knowing @code{mpn_add_n} won't have partly
  9636  overlapping sources and destination means reading can be done far ahead of
  9637  writing on superscalar processors, and loops can be vectorized on a vector
  9638  processor, depending on the carry handling.
  9639  
  9640  
  9641  @node Assembly Carry Propagation, Assembly Cache Handling, Assembly Basics, Assembly Coding
  9642  @subsection Carry Propagation
  9643  @cindex Assembly carry propagation
  9644  
  9645  The problem that presents most challenges in GMP is propagating carries from
  9646  one limb to the next.  In functions like @code{mpn_addmul_1} and
  9647  @code{mpn_add_n}, carries are the only dependencies between limb operations.
  9648  
  9649  On processors with carry flags, a straightforward CISC style @code{adc} is
  9650  generally best.  AMD K6 @code{mpn_addmul_1} however is an example of an
  9651  unusual set of circumstances where a branch works out better.
  9652  
  9653  On RISC processors generally an add and compare for overflow is used.  This
  9654  sort of thing can be seen in @file{mpn/generic/aors_n.c}.  Some carry
  9655  propagation schemes require 4 instructions, meaning at least 4 cycles per
  9656  limb, but other schemes may use just 1 or 2.  On wide superscalar processors
  9657  performance may be completely determined by the number of dependent
  9658  instructions between carry-in and carry-out for each limb.
  9659  
  9660  On vector processors good use can be made of the fact that a carry bit only
  9661  very rarely propagates more than one limb.  When adding a single bit to a
  9662  limb, there's only a carry out if that limb was @code{0xFF@dots{}FF} which on
  9663  random data will be only 1 in @m{2\GMPraise{@code{mp\_bits\_per\_limb}},
  9664  2^mp_bits_per_limb}.  @file{mpn/cray/add_n.c} is an example of this, it adds
  9665  all limbs in parallel, adds one set of carry bits in parallel and then only
  9666  rarely needs to fall through to a loop propagating further carries.
  9667  
  9668  On the x86s, GCC (as of version 2.95.2) doesn't generate particularly good code
  9669  for the RISC style idioms that are necessary to handle carry bits in
  9670  C@.  Often conditional jumps are generated where @code{adc} or @code{sbb} forms
  9671  would be better.  And so unfortunately almost any loop involving carry bits
  9672  needs to be coded in assembly for best results.
  9673  
  9674  
  9675  @node Assembly Cache Handling, Assembly Functional Units, Assembly Carry Propagation, Assembly Coding
  9676  @subsection Cache Handling
  9677  @cindex Assembly cache handling
  9678  
  9679  GMP aims to perform well both on operands that fit entirely in L1 cache and
  9680  those which don't.
  9681  
  9682  Basic routines like @code{mpn_add_n} or @code{mpn_lshift} are often used on
  9683  large operands, so L2 and main memory performance is important for them.
  9684  @code{mpn_mul_1} and @code{mpn_addmul_1} are mostly used for multiply and
  9685  square basecases, so L1 performance matters most for them, unless assembly
  9686  versions of @code{mpn_mul_basecase} and @code{mpn_sqr_basecase} exist, in
  9687  which case the remaining uses are mostly for larger operands.
  9688  
  9689  For L2 or main memory operands, memory access times will almost certainly be
  9690  more than the calculation time.  The aim therefore is to maximize memory
  9691  throughput, by starting a load of the next cache line while processing the
  9692  contents of the previous one.  Clearly this is only possible if the chip has a
  9693  lock-up free cache or some sort of prefetch instruction.  Most current chips
  9694  have both these features.
  9695  
  9696  Prefetching sources combines well with loop unrolling, since a prefetch can be
  9697  initiated once per unrolled loop (or more than once if the loop covers more
  9698  than one cache line).
  9699  
  9700  On CPUs without write-allocate caches, prefetching destinations will ensure
  9701  individual stores don't go further down the cache hierarchy, limiting
  9702  bandwidth.  Of course for calculations which are slow anyway, like
  9703  @code{mpn_divrem_1}, write-throughs might be fine.
  9704  
  9705  The distance ahead to prefetch will be determined by memory latency versus
  9706  throughput.  The aim of course is to have data arriving continuously, at peak
  9707  throughput.  Some CPUs have limits on the number of fetches or prefetches in
  9708  progress.
  9709  
  9710  If a special prefetch instruction doesn't exist then a plain load can be used,
  9711  but in that case care must be taken not to attempt to read past the end of an
  9712  operand, since that might produce a segmentation violation.
  9713  
  9714  Some CPUs or systems have hardware that detects sequential memory accesses and
  9715  initiates suitable cache movements automatically, making life easy.
  9716  
  9717  
  9718  @node Assembly Functional Units, Assembly Floating Point, Assembly Cache Handling, Assembly Coding
  9719  @subsection Functional Units
  9720  
  9721  When choosing an approach for an assembly loop, consideration is given to
  9722  what operations can execute simultaneously and what throughput can thereby be
  9723  achieved.  In some cases an algorithm can be tweaked to accommodate available
  9724  resources.
  9725  
  9726  Loop control will generally require a counter and pointer updates, costing as
  9727  much as 5 instructions, plus any delays a branch introduces.  CPU addressing
  9728  modes might reduce pointer updates, perhaps by allowing just one updating
  9729  pointer and others expressed as offsets from it, or on CISC chips with all
  9730  addressing done with the loop counter as a scaled index.
  9731  
  9732  The final loop control cost can be amortised by processing several limbs in
  9733  each iteration (@pxref{Assembly Loop Unrolling}).  This at least ensures loop
  9734  control isn't a big fraction the work done.
  9735  
  9736  Memory throughput is always a limit.  If perhaps only one load or one store
  9737  can be done per cycle then 3 cycles/limb will the top speed for ``binary''
  9738  operations like @code{mpn_add_n}, and any code achieving that is optimal.
  9739  
  9740  Integer resources can be freed up by having the loop counter in a float
  9741  register, or by pressing the float units into use for some multiplying,
  9742  perhaps doing every second limb on the float side (@pxref{Assembly Floating
  9743  Point}).
  9744  
  9745  Float resources can be freed up by doing carry propagation on the integer
  9746  side, or even by doing integer to float conversions in integers using bit
  9747  twiddling.
  9748  
  9749  
  9750  @node Assembly Floating Point, Assembly SIMD Instructions, Assembly Functional Units, Assembly Coding
  9751  @subsection Floating Point
  9752  @cindex Assembly floating Point
  9753  
  9754  Floating point arithmetic is used in GMP for multiplications on CPUs with poor
  9755  integer multipliers.  It's mostly useful for @code{mpn_mul_1},
  9756  @code{mpn_addmul_1} and @code{mpn_submul_1} on 64-bit machines, and
  9757  @code{mpn_mul_basecase} on both 32-bit and 64-bit machines.
  9758  
  9759  With IEEE 53-bit double precision floats, integer multiplications producing up
  9760  to 53 bits will give exact results.  Breaking a 64@cross{}64 multiplication
  9761  into eight 16@cross{}@math{32@rightarrow{}48} bit pieces is convenient.  With
  9762  some care though six 21@cross{}@math{32@rightarrow{}53} bit products can be
  9763  used, if one of the lower two 21-bit pieces also uses the sign bit.
  9764  
  9765  For the @code{mpn_mul_1} family of functions on a 64-bit machine, the
  9766  invariant single limb is split at the start, into 3 or 4 pieces.  Inside the
  9767  loop, the bignum operand is split into 32-bit pieces.  Fast conversion of
  9768  these unsigned 32-bit pieces to floating point is highly machine-dependent.
  9769  In some cases, reading the data into the integer unit, zero-extending to
  9770  64-bits, then transferring to the floating point unit back via memory is the
  9771  only option.
  9772  
  9773  Converting partial products back to 64-bit limbs is usually best done as a
  9774  signed conversion.  Since all values are smaller than @m{2^{53},2^53}, signed
  9775  and unsigned are the same, but most processors lack unsigned conversions.
  9776  
  9777  @sp 2
  9778  
  9779  Here is a diagram showing 16@cross{}32 bit products for an @code{mpn_mul_1} or
  9780  @code{mpn_addmul_1} with a 64-bit limb.  The single limb operand V is split
  9781  into four 16-bit parts.  The multi-limb operand U is split in the loop into
  9782  two 32-bit parts.
  9783  
  9784  @tex
  9785  \global\newdimen\GMPbits      \global\GMPbits=0.18em
  9786  \def\GMPbox#1#2#3{%
  9787    \hbox{%
  9788      \hbox to 128\GMPbits{\hfil
  9789        \vbox{%
  9790          \hrule
  9791          \hbox to 48\GMPbits {\GMPvrule \hfil$#2$\hfil \vrule}%
  9792          \hrule}%
  9793        \hskip #1\GMPbits}%
  9794      \raise \GMPboxdepth \hbox{\hskip 2em #3}}}
  9795  %
  9796  \GMPdisplay{%
  9797    \vbox{%
  9798      \hbox{%
  9799        \hbox to 128\GMPbits {\hfil
  9800          \vbox{%
  9801            \hrule
  9802            \hbox to 64\GMPbits{%
  9803              \GMPvrule \hfil$v48$\hfil
  9804              \vrule    \hfil$v32$\hfil
  9805              \vrule    \hfil$v16$\hfil
  9806              \vrule    \hfil$v00$\hfil
  9807              \vrule}
  9808            \hrule}}%
  9809         \raise \GMPboxdepth \hbox{\hskip 2em V Operand}}
  9810      \vskip 0.5ex
  9811      \hbox{%
  9812        \hbox to 128\GMPbits {\hfil
  9813          \raise \GMPboxdepth \hbox{$\times$\hskip 1.5em}%
  9814          \vbox{%
  9815            \hrule
  9816            \hbox to 64\GMPbits {%
  9817              \GMPvrule \hfil$u32$\hfil
  9818              \vrule \hfil$u00$\hfil
  9819              \vrule}%
  9820            \hrule}}%
  9821         \raise \GMPboxdepth \hbox{\hskip 2em U Operand (one limb)}}%
  9822      \vskip 0.5ex
  9823      \hbox{\vbox to 2ex{\hrule width 128\GMPbits}}%
  9824      \GMPbox{0}{u00 \times v00}{$p00$\hskip 1.5em 48-bit products}%
  9825      \vskip 0.5ex
  9826      \GMPbox{16}{u00 \times v16}{$p16$}
  9827      \vskip 0.5ex
  9828      \GMPbox{32}{u00 \times v32}{$p32$}
  9829      \vskip 0.5ex
  9830      \GMPbox{48}{u00 \times v48}{$p48$}
  9831      \vskip 0.5ex
  9832      \GMPbox{32}{u32 \times v00}{$r32$}
  9833      \vskip 0.5ex
  9834      \GMPbox{48}{u32 \times v16}{$r48$}
  9835      \vskip 0.5ex
  9836      \GMPbox{64}{u32 \times v32}{$r64$}
  9837      \vskip 0.5ex
  9838      \GMPbox{80}{u32 \times v48}{$r80$}
  9839  }}
  9840  @end tex
  9841  @ifnottex
  9842  @example
  9843  @group
  9844                  +---+---+---+---+
  9845                  |v48|v32|v16|v00|    V operand
  9846                  +---+---+---+---+
  9847  
  9848                  +-------+---+---+
  9849              x   |  u32  |  u00  |    U operand (one limb)
  9850                  +---------------+
  9851  
  9852  ---------------------------------
  9853  
  9854                      +-----------+
  9855                      | u00 x v00 |    p00    48-bit products
  9856                      +-----------+
  9857                  +-----------+
  9858                  | u00 x v16 |        p16
  9859                  +-----------+
  9860              +-----------+
  9861              | u00 x v32 |            p32
  9862              +-----------+
  9863          +-----------+
  9864          | u00 x v48 |                p48
  9865          +-----------+
  9866              +-----------+
  9867              | u32 x v00 |            r32
  9868              +-----------+
  9869          +-----------+
  9870          | u32 x v16 |                r48
  9871          +-----------+
  9872      +-----------+
  9873      | u32 x v32 |                    r64
  9874      +-----------+
  9875  +-----------+
  9876  | u32 x v48 |                        r80
  9877  +-----------+
  9878  @end group
  9879  @end example
  9880  @end ifnottex
  9881  
  9882  @math{p32} and @math{r32} can be summed using floating-point addition, and
  9883  likewise @math{p48} and @math{r48}.  @math{p00} and @math{p16} can be summed
  9884  with @math{r64} and @math{r80} from the previous iteration.
  9885  
  9886  For each loop then, four 49-bit quantities are transferred to the integer unit,
  9887  aligned as follows,
  9888  
  9889  @tex
  9890  % GMPbox here should be 49 bits wide, but use 51 to better show p16+r80'
  9891  % crossing into the upper 64 bits.
  9892  \def\GMPbox#1#2#3{%
  9893    \hbox{%
  9894      \hbox to 128\GMPbits {%
  9895        \hfil
  9896        \vbox{%
  9897          \hrule
  9898          \hbox to 51\GMPbits {\GMPvrule \hfil$#2$\hfil \vrule}%
  9899          \hrule}%
  9900        \hskip #1\GMPbits}%
  9901      \raise \GMPboxdepth \hbox{\hskip 1.5em $#3$\hfil}%
  9902  }}
  9903  \newbox\b \setbox\b\hbox{64 bits}%
  9904  \newdimen\bw \bw=\wd\b \advance\bw by 2em
  9905  \newdimen\x \x=128\GMPbits
  9906  \advance\x by -2\bw
  9907  \divide\x by4
  9908  \GMPdisplay{%
  9909    \vbox{%
  9910      \hbox to 128\GMPbits {%
  9911        \GMPvrule
  9912        \raise 0.5ex \vbox{\hrule \hbox to \x {}}%
  9913        \hfil 64 bits\hfil
  9914        \raise 0.5ex \vbox{\hrule \hbox to \x {}}%
  9915        \vrule
  9916        \raise 0.5ex \vbox{\hrule \hbox to \x {}}%
  9917        \hfil 64 bits\hfil
  9918        \raise 0.5ex \vbox{\hrule \hbox to \x {}}%
  9919        \vrule}%
  9920      \vskip 0.7ex
  9921      \GMPbox{0}{p00+r64'}{i00}
  9922      \vskip 0.5ex
  9923      \GMPbox{16}{p16+r80'}{i16}
  9924      \vskip 0.5ex
  9925      \GMPbox{32}{p32+r32}{i32}
  9926      \vskip 0.5ex
  9927      \GMPbox{48}{p48+r48}{i48}
  9928  }}
  9929  @end tex
  9930  @ifnottex
  9931  @example
  9932  @group
  9933  |-----64bits----|-----64bits----|
  9934                     +------------+
  9935                     | p00 + r64' |    i00
  9936                     +------------+
  9937                 +------------+
  9938                 | p16 + r80' |        i16
  9939                 +------------+
  9940             +------------+
  9941             | p32 + r32  |            i32
  9942             +------------+
  9943         +------------+
  9944         | p48 + r48  |                i48
  9945         +------------+
  9946  @end group
  9947  @end example
  9948  @end ifnottex
  9949  
  9950  The challenge then is to sum these efficiently and add in a carry limb,
  9951  generating a low 64-bit result limb and a high 33-bit carry limb (@math{i48}
  9952  extends 33 bits into the high half).
  9953  
  9954  
  9955  @node Assembly SIMD Instructions, Assembly Software Pipelining, Assembly Floating Point, Assembly Coding
  9956  @subsection SIMD Instructions
  9957  @cindex Assembly SIMD
  9958  
  9959  The single-instruction multiple-data support in current microprocessors is
  9960  aimed at signal processing algorithms where each data point can be treated
  9961  more or less independently.  There's generally not much support for
  9962  propagating the sort of carries that arise in GMP.
  9963  
  9964  SIMD multiplications of say four 16@cross{}16 bit multiplies only do as much
  9965  work as one 32@cross{}32 from GMP's point of view, and need some shifts and
  9966  adds besides.  But of course if say the SIMD form is fully pipelined and uses
  9967  less instruction decoding then it may still be worthwhile.
  9968  
  9969  On the x86 chips, MMX has so far found a use in @code{mpn_rshift} and
  9970  @code{mpn_lshift}, and is used in a special case for 16-bit multipliers in the
  9971  P55 @code{mpn_mul_1}.  SSE2 is used for Pentium 4 @code{mpn_mul_1},
  9972  @code{mpn_addmul_1}, and @code{mpn_submul_1}.
  9973  
  9974  
  9975  @node Assembly Software Pipelining, Assembly Loop Unrolling, Assembly SIMD Instructions, Assembly Coding
  9976  @subsection Software Pipelining
  9977  @cindex Assembly software pipelining
  9978  
  9979  Software pipelining consists of scheduling instructions around the branch
  9980  point in a loop.  For example a loop might issue a load not for use in the
  9981  present iteration but the next, thereby allowing extra cycles for the data to
  9982  arrive from memory.
  9983  
  9984  Naturally this is wanted only when doing things like loads or multiplies that
  9985  take several cycles to complete, and only where a CPU has multiple functional
  9986  units so that other work can be done in the meantime.
  9987  
  9988  A pipeline with several stages will have a data value in progress at each
  9989  stage and each loop iteration moves them along one stage.  This is like
  9990  juggling.
  9991  
  9992  If the latency of some instruction is greater than the loop time then it will
  9993  be necessary to unroll, so one register has a result ready to use while
  9994  another (or multiple others) are still in progress.  (@pxref{Assembly Loop
  9995  Unrolling}).
  9996  
  9997  
  9998  @node Assembly Loop Unrolling, Assembly Writing Guide, Assembly Software Pipelining, Assembly Coding
  9999  @subsection Loop Unrolling
 10000  @cindex Assembly loop unrolling
 10001  
 10002  Loop unrolling consists of replicating code so that several limbs are
 10003  processed in each loop.  At a minimum this reduces loop overheads by a
 10004  corresponding factor, but it can also allow better register usage, for example
 10005  alternately using one register combination and then another.  Judicious use of
 10006  @command{m4} macros can help avoid lots of duplication in the source code.
 10007  
 10008  Any amount of unrolling can be handled with a loop counter that's decremented
 10009  by @math{N} each time, stopping when the remaining count is less than the
 10010  further @math{N} the loop will process.  Or by subtracting @math{N} at the
 10011  start, the termination condition becomes when the counter @math{C} is less
 10012  than 0 (and the count of remaining limbs is @math{C+N}).
 10013  
 10014  Alternately for a power of 2 unroll the loop count and remainder can be
 10015  established with a shift and mask.  This is convenient if also making a
 10016  computed jump into the middle of a large loop.
 10017  
 10018  The limbs not a multiple of the unrolling can be handled in various ways, for
 10019  example
 10020  
 10021  @itemize @bullet
 10022  @item
 10023  A simple loop at the end (or the start) to process the excess.  Care will be
 10024  wanted that it isn't too much slower than the unrolled part.
 10025  
 10026  @item
 10027  A set of binary tests, for example after an 8-limb unrolling, test for 4 more
 10028  limbs to process, then a further 2 more or not, and finally 1 more or not.
 10029  This will probably take more code space than a simple loop.
 10030  
 10031  @item
 10032  A @code{switch} statement, providing separate code for each possible excess,
 10033  for example an 8-limb unrolling would have separate code for 0 remaining, 1
 10034  remaining, etc, up to 7 remaining.  This might take a lot of code, but may be
 10035  the best way to optimize all cases in combination with a deep pipelined loop.
 10036  
 10037  @item
 10038  A computed jump into the middle of the loop, thus making the first iteration
 10039  handle the excess.  This should make times smoothly increase with size, which
 10040  is attractive, but setups for the jump and adjustments for pointers can be
 10041  tricky and could become quite difficult in combination with deep pipelining.
 10042  @end itemize
 10043  
 10044  
 10045  @node Assembly Writing Guide,  , Assembly Loop Unrolling, Assembly Coding
 10046  @subsection Writing Guide
 10047  @cindex Assembly writing guide
 10048  
 10049  This is a guide to writing software pipelined loops for processing limb
 10050  vectors in assembly.
 10051  
 10052  First determine the algorithm and which instructions are needed.  Code it
 10053  without unrolling or scheduling, to make sure it works.  On a 3-operand CPU
 10054  try to write each new value to a new register, this will greatly simplify later
 10055  steps.
 10056  
 10057  Then note for each instruction the functional unit and/or issue port
 10058  requirements.  If an instruction can use either of two units, like U0 or U1
 10059  then make a category ``U0/U1''.  Count the total using each unit (or combined
 10060  unit), and count all instructions.
 10061  
 10062  Figure out from those counts the best possible loop time.  The goal will be to
 10063  find a perfect schedule where instruction latencies are completely hidden.
 10064  The total instruction count might be the limiting factor, or perhaps a
 10065  particular functional unit.  It might be possible to tweak the instructions to
 10066  help the limiting factor.
 10067  
 10068  Suppose the loop time is @math{N}, then make @math{N} issue buckets, with the
 10069  final loop branch at the end of the last.  Now fill the buckets with dummy
 10070  instructions using the functional units desired.  Run this to make sure the
 10071  intended speed is reached.
 10072  
 10073  Now replace the dummy instructions with the real instructions from the slow
 10074  but correct loop you started with.  The first will typically be a load
 10075  instruction.  Then the instruction using that value is placed in a bucket an
 10076  appropriate distance down.  Run the loop again, to check it still runs at
 10077  target speed.
 10078  
 10079  Keep placing instructions, frequently measuring the loop.  After a few you
 10080  will need to wrap around from the last bucket back to the top of the loop.  If
 10081  you used the new-register for new-value strategy above then there will be no
 10082  register conflicts.  If not then take care not to clobber something already in
 10083  use.  Changing registers at this time is very error prone.
 10084  
 10085  The loop will overlap two or more of the original loop iterations, and the
 10086  computation of one vector element result will be started in one iteration of
 10087  the new loop, and completed one or several iterations later.
 10088  
 10089  The final step is to create feed-in and wind-down code for the loop.  A good
 10090  way to do this is to make a copy (or copies) of the loop at the start and
 10091  delete those instructions which don't have valid antecedents, and at the end
 10092  replicate and delete those whose results are unwanted (including any further
 10093  loads).
 10094  
 10095  The loop will have a minimum number of limbs loaded and processed, so the
 10096  feed-in code must test if the request size is smaller and skip either to a
 10097  suitable part of the wind-down or to special code for small sizes.
 10098  
 10099  
 10100  @node Internals, Contributors, Algorithms, Top
 10101  @chapter Internals
 10102  @cindex Internals
 10103  
 10104  @strong{This chapter is provided only for informational purposes and the
 10105  various internals described here may change in future GMP releases.
 10106  Applications expecting to be compatible with future releases should use only
 10107  the documented interfaces described in previous chapters.}
 10108  
 10109  @menu
 10110  * Integer Internals::
 10111  * Rational Internals::
 10112  * Float Internals::
 10113  * Raw Output Internals::
 10114  * C++ Interface Internals::
 10115  @end menu
 10116  
 10117  @node Integer Internals, Rational Internals, Internals, Internals
 10118  @section Integer Internals
 10119  @cindex Integer internals
 10120  
 10121  @code{mpz_t} variables represent integers using sign and magnitude, in space
 10122  dynamically allocated and reallocated.  The fields are as follows.
 10123  
 10124  @table @asis
 10125  @item @code{_mp_size}
 10126  The number of limbs, or the negative of that when representing a negative
 10127  integer.  Zero is represented by @code{_mp_size} set to zero, in which case
 10128  the @code{_mp_d} data is unused.
 10129  
 10130  @item @code{_mp_d}
 10131  A pointer to an array of limbs which is the magnitude.  These are stored
 10132  ``little endian'' as per the @code{mpn} functions, so @code{_mp_d[0]} is the
 10133  least significant limb and @code{_mp_d[ABS(_mp_size)-1]} is the most
 10134  significant.  Whenever @code{_mp_size} is non-zero, the most significant limb
 10135  is non-zero.
 10136  
 10137  Currently there's always at least one limb allocated, so for instance
 10138  @code{mpz_set_ui} never needs to reallocate, and @code{mpz_get_ui} can fetch
 10139  @code{_mp_d[0]} unconditionally (though its value is then only wanted if
 10140  @code{_mp_size} is non-zero).
 10141  
 10142  @item @code{_mp_alloc}
 10143  @code{_mp_alloc} is the number of limbs currently allocated at @code{_mp_d},
 10144  and naturally @code{_mp_alloc >= ABS(_mp_size)}.  When an @code{mpz} routine
 10145  is about to (or might be about to) increase @code{_mp_size}, it checks
 10146  @code{_mp_alloc} to see whether there's enough space, and reallocates if not.
 10147  @code{MPZ_REALLOC} is generally used for this.
 10148  @end table
 10149  
 10150  The various bitwise logical functions like @code{mpz_and} behave as if
 10151  negative values were twos complement.  But sign and magnitude is always used
 10152  internally, and necessary adjustments are made during the calculations.
 10153  Sometimes this isn't pretty, but sign and magnitude are best for other
 10154  routines.
 10155  
 10156  Some internal temporary variables are setup with @code{MPZ_TMP_INIT} and these
 10157  have @code{_mp_d} space obtained from @code{TMP_ALLOC} rather than the memory
 10158  allocation functions.  Care is taken to ensure that these are big enough that
 10159  no reallocation is necessary (since it would have unpredictable consequences).
 10160  
 10161  @code{_mp_size} and @code{_mp_alloc} are @code{int}, although @code{mp_size_t}
 10162  is usually a @code{long}.  This is done to make the fields just 32 bits on
 10163  some 64 bits systems, thereby saving a few bytes of data space but still
 10164  providing plenty of range.
 10165  
 10166  
 10167  @node Rational Internals, Float Internals, Integer Internals, Internals
 10168  @section Rational Internals
 10169  @cindex Rational internals
 10170  
 10171  @code{mpq_t} variables represent rationals using an @code{mpz_t} numerator and
 10172  denominator (@pxref{Integer Internals}).
 10173  
 10174  The canonical form adopted is denominator positive (and non-zero), no common
 10175  factors between numerator and denominator, and zero uniquely represented as
 10176  0/1.
 10177  
 10178  It's believed that casting out common factors at each stage of a calculation
 10179  is best in general.  A GCD is an @math{O(N^2)} operation so it's better to do
 10180  a few small ones immediately than to delay and have to do a big one later.
 10181  Knowing the numerator and denominator have no common factors can be used for
 10182  example in @code{mpq_mul} to make only two cross GCDs necessary, not four.
 10183  
 10184  This general approach to common factors is badly sub-optimal in the presence
 10185  of simple factorizations or little prospect for cancellation, but GMP has no
 10186  way to know when this will occur.  As per @ref{Efficiency}, that's left to
 10187  applications.  The @code{mpq_t} framework might still suit, with
 10188  @code{mpq_numref} and @code{mpq_denref} for direct access to the numerator and
 10189  denominator, or of course @code{mpz_t} variables can be used directly.
 10190  
 10191  
 10192  @node Float Internals, Raw Output Internals, Rational Internals, Internals
 10193  @section Float Internals
 10194  @cindex Float internals
 10195  
 10196  Efficient calculation is the primary aim of GMP floats and the use of whole
 10197  limbs and simple rounding facilitates this.
 10198  
 10199  @code{mpf_t} floats have a variable precision mantissa and a single machine
 10200  word signed exponent.  The mantissa is represented using sign and magnitude.
 10201  
 10202  @c FIXME: The arrow heads don't join to the lines exactly.
 10203  @tex
 10204  \global\newdimen\GMPboxwidth \GMPboxwidth=5em
 10205  \global\newdimen\GMPboxheight \GMPboxheight=3ex
 10206  \def\centreline{\hbox{\raise 0.8ex \vbox{\hrule \hbox{\hfil}}}}
 10207  \GMPdisplay{%
 10208  \vbox{%
 10209    \hbox to 5\GMPboxwidth {most significant limb \hfil least significant limb}
 10210    \vskip 0.7ex
 10211    \def\GMPcentreline#1{\hbox{\raise 0.5 ex \vbox{\hrule \hbox to #1 {}}}}
 10212    \hbox {
 10213      \hbox to 3\GMPboxwidth {%
 10214        \setbox 0 = \hbox{@code{\_mp\_exp}}%
 10215        \dimen0=3\GMPboxwidth
 10216        \advance\dimen0 by -\wd0
 10217        \divide\dimen0 by 2
 10218        \advance\dimen0 by -1em
 10219        \setbox1 = \hbox{$\rightarrow$}%
 10220        \dimen1=\dimen0
 10221        \advance\dimen1 by -\wd1
 10222        \GMPcentreline{\dimen0}%
 10223        \hfil
 10224        \box0%
 10225        \hfil
 10226        \GMPcentreline{\dimen1{}}%
 10227        \box1}
 10228      \hbox to 2\GMPboxwidth {\hfil @code{\_mp\_d}}}
 10229    \vskip 0.5ex
 10230    \vbox {%
 10231      \hrule
 10232      \hbox{%
 10233        \vrule height 2ex depth 1ex
 10234        \hbox to \GMPboxwidth {}%
 10235        \vrule
 10236        \hbox to \GMPboxwidth {}%
 10237        \vrule
 10238        \hbox to \GMPboxwidth {}%
 10239        \vrule
 10240        \hbox to \GMPboxwidth {}%
 10241        \vrule
 10242        \hbox to \GMPboxwidth {}%
 10243        \vrule}
 10244      \hrule
 10245    }
 10246    \hbox {%
 10247      \hbox to 0.8 pt {}
 10248      \hbox to 3\GMPboxwidth {%
 10249        \hfil $\cdot$} \hbox {$\leftarrow$ radix point\hfil}}
 10250    \hbox to 5\GMPboxwidth{%
 10251      \setbox 0 = \hbox{@code{\_mp\_size}}%
 10252      \dimen0 = 5\GMPboxwidth
 10253      \advance\dimen0 by -\wd0
 10254      \divide\dimen0 by 2
 10255      \advance\dimen0 by -1em
 10256      \dimen1 = \dimen0
 10257      \setbox1 = \hbox{$\leftarrow$}%
 10258      \setbox2 = \hbox{$\rightarrow$}%
 10259      \advance\dimen0 by -\wd1
 10260      \advance\dimen1 by -\wd2
 10261      \hbox to 0.3 em {}%
 10262      \box1
 10263      \GMPcentreline{\dimen0}%
 10264      \hfil
 10265      \box0
 10266      \hfil
 10267      \GMPcentreline{\dimen1}%
 10268      \box2}
 10269  }}
 10270  @end tex
 10271  @ifnottex
 10272  @example
 10273     most                   least
 10274  significant            significant
 10275     limb                   limb
 10276  
 10277                              _mp_d
 10278   |---- _mp_exp --->           |
 10279    _____ _____ _____ _____ _____
 10280   |_____|_____|_____|_____|_____|
 10281                     . <------------ radix point
 10282  
 10283    <-------- _mp_size --------->
 10284  @sp 1
 10285  @end example
 10286  @end ifnottex
 10287  
 10288  @noindent
 10289  The fields are as follows.
 10290  
 10291  @table @asis
 10292  @item @code{_mp_size}
 10293  The number of limbs currently in use, or the negative of that when
 10294  representing a negative value.  Zero is represented by @code{_mp_size} and
 10295  @code{_mp_exp} both set to zero, and in that case the @code{_mp_d} data is
 10296  unused.  (In the future @code{_mp_exp} might be undefined when representing
 10297  zero.)
 10298  
 10299  @item @code{_mp_prec}
 10300  The precision of the mantissa, in limbs.  In any calculation the aim is to
 10301  produce @code{_mp_prec} limbs of result (the most significant being non-zero).
 10302  
 10303  @item @code{_mp_d}
 10304  A pointer to the array of limbs which is the absolute value of the mantissa.
 10305  These are stored ``little endian'' as per the @code{mpn} functions, so
 10306  @code{_mp_d[0]} is the least significant limb and
 10307  @code{_mp_d[ABS(_mp_size)-1]} the most significant.
 10308  
 10309  The most significant limb is always non-zero, but there are no other
 10310  restrictions on its value, in particular the highest 1 bit can be anywhere
 10311  within the limb.
 10312  
 10313  @code{_mp_prec+1} limbs are allocated to @code{_mp_d}, the extra limb being
 10314  for convenience (see below).  There are no reallocations during a calculation,
 10315  only in a change of precision with @code{mpf_set_prec}.
 10316  
 10317  @item @code{_mp_exp}
 10318  The exponent, in limbs, determining the location of the implied radix point.
 10319  Zero means the radix point is just above the most significant limb.  Positive
 10320  values mean a radix point offset towards the lower limbs and hence a value
 10321  @math{@ge{} 1}, as for example in the diagram above.  Negative exponents mean
 10322  a radix point further above the highest limb.
 10323  
 10324  Naturally the exponent can be any value, it doesn't have to fall within the
 10325  limbs as the diagram shows, it can be a long way above or a long way below.
 10326  Limbs other than those included in the @code{@{_mp_d,_mp_size@}} data
 10327  are treated as zero.
 10328  @end table
 10329  
 10330  The @code{_mp_size} and @code{_mp_prec} fields are @code{int}, although the
 10331  @code{mp_size_t} type is usually a @code{long}.  The @code{_mp_exp} field is
 10332  usually @code{long}.  This is done to make some fields just 32 bits on some 64
 10333  bits systems, thereby saving a few bytes of data space but still providing
 10334  plenty of precision and a very large range.
 10335  
 10336  
 10337  @sp 1
 10338  @noindent
 10339  The following various points should be noted.
 10340  
 10341  @table @asis
 10342  @item Low Zeros
 10343  The least significant limbs @code{_mp_d[0]} etc can be zero, though such low
 10344  zeros can always be ignored.  Routines likely to produce low zeros check and
 10345  avoid them to save time in subsequent calculations, but for most routines
 10346  they're quite unlikely and aren't checked.
 10347  
 10348  @item Mantissa Size Range
 10349  The @code{_mp_size} count of limbs in use can be less than @code{_mp_prec} if
 10350  the value can be represented in less.  This means low precision values or
 10351  small integers stored in a high precision @code{mpf_t} can still be operated
 10352  on efficiently.
 10353  
 10354  @code{_mp_size} can also be greater than @code{_mp_prec}.  Firstly a value is
 10355  allowed to use all of the @code{_mp_prec+1} limbs available at @code{_mp_d},
 10356  and secondly when @code{mpf_set_prec_raw} lowers @code{_mp_prec} it leaves
 10357  @code{_mp_size} unchanged and so the size can be arbitrarily bigger than
 10358  @code{_mp_prec}.
 10359  
 10360  @item Rounding
 10361  All rounding is done on limb boundaries.  Calculating @code{_mp_prec} limbs
 10362  with the high non-zero will ensure the application requested minimum precision
 10363  is obtained.
 10364  
 10365  The use of simple ``trunc'' rounding towards zero is efficient, since there's
 10366  no need to examine extra limbs and increment or decrement.
 10367  
 10368  @item Bit Shifts
 10369  Since the exponent is in limbs, there are no bit shifts in basic operations
 10370  like @code{mpf_add} and @code{mpf_mul}.  When differing exponents are
 10371  encountered all that's needed is to adjust pointers to line up the relevant
 10372  limbs.
 10373  
 10374  Of course @code{mpf_mul_2exp} and @code{mpf_div_2exp} will require bit shifts,
 10375  but the choice is between an exponent in limbs which requires shifts there, or
 10376  one in bits which requires them almost everywhere else.
 10377  
 10378  @item Use of @code{_mp_prec+1} Limbs
 10379  The extra limb on @code{_mp_d} (@code{_mp_prec+1} rather than just
 10380  @code{_mp_prec}) helps when an @code{mpf} routine might get a carry from its
 10381  operation.  @code{mpf_add} for instance will do an @code{mpn_add} of
 10382  @code{_mp_prec} limbs.  If there's no carry then that's the result, but if
 10383  there is a carry then it's stored in the extra limb of space and
 10384  @code{_mp_size} becomes @code{_mp_prec+1}.
 10385  
 10386  Whenever @code{_mp_prec+1} limbs are held in a variable, the low limb is not
 10387  needed for the intended precision, only the @code{_mp_prec} high limbs.  But
 10388  zeroing it out or moving the rest down is unnecessary.  Subsequent routines
 10389  reading the value will simply take the high limbs they need, and this will be
 10390  @code{_mp_prec} if their target has that same precision.  This is no more than
 10391  a pointer adjustment, and must be checked anyway since the destination
 10392  precision can be different from the sources.
 10393  
 10394  Copy functions like @code{mpf_set} will retain a full @code{_mp_prec+1} limbs
 10395  if available.  This ensures that a variable which has @code{_mp_size} equal to
 10396  @code{_mp_prec+1} will get its full exact value copied.  Strictly speaking
 10397  this is unnecessary since only @code{_mp_prec} limbs are needed for the
 10398  application's requested precision, but it's considered that an @code{mpf_set}
 10399  from one variable into another of the same precision ought to produce an exact
 10400  copy.
 10401  
 10402  @item Application Precisions
 10403  @code{__GMPF_BITS_TO_PREC} converts an application requested precision to an
 10404  @code{_mp_prec}.  The value in bits is rounded up to a whole limb then an
 10405  extra limb is added since the most significant limb of @code{_mp_d} is only
 10406  non-zero and therefore might contain only one bit.
 10407  
 10408  @code{__GMPF_PREC_TO_BITS} does the reverse conversion, and removes the extra
 10409  limb from @code{_mp_prec} before converting to bits.  The net effect of
 10410  reading back with @code{mpf_get_prec} is simply the precision rounded up to a
 10411  multiple of @code{mp_bits_per_limb}.
 10412  
 10413  Note that the extra limb added here for the high only being non-zero is in
 10414  addition to the extra limb allocated to @code{_mp_d}.  For example with a
 10415  32-bit limb, an application request for 250 bits will be rounded up to 8
 10416  limbs, then an extra added for the high being only non-zero, giving an
 10417  @code{_mp_prec} of 9.  @code{_mp_d} then gets 10 limbs allocated.  Reading
 10418  back with @code{mpf_get_prec} will take @code{_mp_prec} subtract 1 limb and
 10419  multiply by 32, giving 256 bits.
 10420  
 10421  Strictly speaking, the fact the high limb has at least one bit means that a
 10422  float with, say, 3 limbs of 32-bits each will be holding at least 65 bits, but
 10423  for the purposes of @code{mpf_t} it's considered simply to be 64 bits, a nice
 10424  multiple of the limb size.
 10425  @end table
 10426  
 10427  
 10428  @node Raw Output Internals, C++ Interface Internals, Float Internals, Internals
 10429  @section Raw Output Internals
 10430  @cindex Raw output internals
 10431  
 10432  @noindent
 10433  @code{mpz_out_raw} uses the following format.
 10434  
 10435  @tex
 10436  \global\newdimen\GMPboxwidth \GMPboxwidth=5em
 10437  \global\newdimen\GMPboxheight \GMPboxheight=3ex
 10438  \def\centreline{\hbox{\raise 0.8ex \vbox{\hrule \hbox{\hfil}}}}
 10439  \GMPdisplay{%
 10440  \vbox{%
 10441    \def\GMPcentreline#1{\hbox{\raise 0.5 ex \vbox{\hrule \hbox to #1 {}}}}
 10442    \vbox {%
 10443      \hrule
 10444      \hbox{%
 10445        \vrule height 2.5ex depth 1.5ex
 10446        \hbox to \GMPboxwidth {\hfil size\hfil}%
 10447        \vrule
 10448        \hbox to 3\GMPboxwidth {\hfil data bytes\hfil}%
 10449        \vrule}
 10450      \hrule}
 10451  }}
 10452  @end tex
 10453  @ifnottex
 10454  @example
 10455  +------+------------------------+
 10456  | size |       data bytes       |
 10457  +------+------------------------+
 10458  @end example
 10459  @end ifnottex
 10460  
 10461  The size is 4 bytes written most significant byte first, being the number of
 10462  subsequent data bytes, or the twos complement negative of that when a negative
 10463  integer is represented.  The data bytes are the absolute value of the integer,
 10464  written most significant byte first.
 10465  
 10466  The most significant data byte is always non-zero, so the output is the same
 10467  on all systems, irrespective of limb size.
 10468  
 10469  In GMP 1, leading zero bytes were written to pad the data bytes to a multiple
 10470  of the limb size.  @code{mpz_inp_raw} will still accept this, for
 10471  compatibility.
 10472  
 10473  The use of ``big endian'' for both the size and data fields is deliberate, it
 10474  makes the data easy to read in a hex dump of a file.  Unfortunately it also
 10475  means that the limb data must be reversed when reading or writing, so neither
 10476  a big endian nor little endian system can just read and write @code{_mp_d}.
 10477  
 10478  
 10479  @node C++ Interface Internals,  , Raw Output Internals, Internals
 10480  @section C++ Interface Internals
 10481  @cindex C++ interface internals
 10482  
 10483  A system of expression templates is used to ensure something like @code{a=b+c}
 10484  turns into a simple call to @code{mpz_add} etc.  For @code{mpf_class}
 10485  the scheme also ensures the precision of the final
 10486  destination is used for any temporaries within a statement like
 10487  @code{f=w*x+y*z}.  These are important features which a naive implementation
 10488  cannot provide.
 10489  
 10490  A simplified description of the scheme follows.  The true scheme is
 10491  complicated by the fact that expressions have different return types.  For
 10492  detailed information, refer to the source code.
 10493  
 10494  To perform an operation, say, addition, we first define a ``function object''
 10495  evaluating it,
 10496  
 10497  @example
 10498  struct __gmp_binary_plus
 10499  @{
 10500    static void eval(mpf_t f, const mpf_t g, const mpf_t h)
 10501    @{
 10502      mpf_add(f, g, h);
 10503    @}
 10504  @};
 10505  @end example
 10506  
 10507  @noindent
 10508  And an ``additive expression'' object,
 10509  
 10510  @example
 10511  __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >
 10512  operator+(const mpf_class &f, const mpf_class &g)
 10513  @{
 10514    return __gmp_expr
 10515      <__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >(f, g);
 10516  @}
 10517  @end example
 10518  
 10519  The seemingly redundant @code{__gmp_expr<__gmp_binary_expr<@dots{}>>} is used to
 10520  encapsulate any possible kind of expression into a single template type.  In
 10521  fact even @code{mpf_class} etc are @code{typedef} specializations of
 10522  @code{__gmp_expr}.
 10523  
 10524  Next we define assignment of @code{__gmp_expr} to @code{mpf_class}.
 10525  
 10526  @example
 10527  template <class T>
 10528  mpf_class & mpf_class::operator=(const __gmp_expr<T> &expr)
 10529  @{
 10530    expr.eval(this->get_mpf_t(), this->precision());
 10531    return *this;
 10532  @}
 10533  
 10534  template <class Op>
 10535  void __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, Op> >::eval
 10536  (mpf_t f, mp_bitcnt_t precision)
 10537  @{
 10538    Op::eval(f, expr.val1.get_mpf_t(), expr.val2.get_mpf_t());
 10539  @}
 10540  @end example
 10541  
 10542  where @code{expr.val1} and @code{expr.val2} are references to the expression's
 10543  operands (here @code{expr} is the @code{__gmp_binary_expr} stored within the
 10544  @code{__gmp_expr}).
 10545  
 10546  This way, the expression is actually evaluated only at the time of assignment,
 10547  when the required precision (that of @code{f}) is known.  Furthermore the
 10548  target @code{mpf_t} is now available, thus we can call @code{mpf_add} directly
 10549  with @code{f} as the output argument.
 10550  
 10551  Compound expressions are handled by defining operators taking subexpressions
 10552  as their arguments, like this:
 10553  
 10554  @example
 10555  template <class T, class U>
 10556  __gmp_expr
 10557  <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
 10558  operator+(const __gmp_expr<T> &expr1, const __gmp_expr<U> &expr2)
 10559  @{
 10560    return __gmp_expr
 10561      <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
 10562      (expr1, expr2);
 10563  @}
 10564  @end example
 10565  
 10566  And the corresponding specializations of @code{__gmp_expr::eval}:
 10567  
 10568  @example
 10569  template <class T, class U, class Op>
 10570  void __gmp_expr
 10571  <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, Op> >::eval
 10572  (mpf_t f, mp_bitcnt_t precision)
 10573  @{
 10574    // declare two temporaries
 10575    mpf_class temp1(expr.val1, precision), temp2(expr.val2, precision);
 10576    Op::eval(f, temp1.get_mpf_t(), temp2.get_mpf_t());
 10577  @}
 10578  @end example
 10579  
 10580  The expression is thus recursively evaluated to any level of complexity and
 10581  all subexpressions are evaluated to the precision of @code{f}.
 10582  
 10583  
 10584  @node Contributors, References, Internals, Top
 10585  @comment  node-name,  next,  previous,  up
 10586  @appendix Contributors
 10587  @cindex Contributors
 10588  
 10589  Torbj@"orn Granlund wrote the original GMP library and is still the main
 10590  developer.  Code not explicitly attributed to others, was contributed by
 10591  Torbj@"orn.  Several other individuals and organizations have contributed
 10592  GMP.  Here is a list in chronological order on first contribution:
 10593  
 10594  Gunnar Sj@"odin and Hans Riesel helped with mathematical problems in early
 10595  versions of the library.
 10596  
 10597  Richard Stallman helped with the interface design and revised the first
 10598  version of this manual.
 10599  
 10600  Brian Beuning and Doug Lea helped with testing of early versions of the
 10601  library and made creative suggestions.
 10602  
 10603  John Amanatides of York University in Canada contributed the function
 10604  @code{mpz_probab_prime_p}.
 10605  
 10606  Paul Zimmermann wrote the REDC-based mpz_powm code, the Sch@"onhage-Strassen
 10607  FFT multiply code, and the Karatsuba square root code.  He also improved the
 10608  Toom3 code for GMP 4.2.  Paul sparked the development of GMP 2, with his
 10609  comparisons between bignum packages.  The ECMNET project Paul is organizing
 10610  was a driving force behind many of the optimizations in GMP 3.  Paul also
 10611  wrote the new GMP 4.3 nth root code (with Torbj@"orn).
 10612  
 10613  Ken Weber (Kent State University, Universidade Federal do Rio Grande do Sul)
 10614  contributed now defunct versions of @code{mpz_gcd}, @code{mpz_divexact},
 10615  @code{mpn_gcd}, and @code{mpn_bdivmod}, partially supported by CNPq (Brazil)
 10616  grant 301314194-2.
 10617  
 10618  Per Bothner of Cygnus Support helped to set up GMP to use Cygnus' configure.
 10619  He has also made valuable suggestions and tested numerous intermediary
 10620  releases.
 10621  
 10622  Joachim Hollman was involved in the design of the @code{mpf} interface, and in
 10623  the @code{mpz} design revisions for version 2.
 10624  
 10625  Bennet Yee contributed the initial versions of @code{mpz_jacobi} and
 10626  @code{mpz_legendre}.
 10627  
 10628  Andreas Schwab contributed the files @file{mpn/m68k/lshift.S} and
 10629  @file{mpn/m68k/rshift.S} (now in @file{.asm} form).
 10630  
 10631  Robert Harley of Inria, France and David Seal of ARM, England, suggested clever
 10632  improvements for population count.  Robert also wrote highly optimized
 10633  Karatsuba and 3-way Toom multiplication functions for GMP 3, and contributed
 10634  the ARM assembly code.
 10635  
 10636  Torsten Ekedahl of the Mathematical department of Stockholm University provided
 10637  significant inspiration during several phases of the GMP development.  His
 10638  mathematical expertise helped improve several algorithms.
 10639  
 10640  Linus Nordberg wrote the new configure system based on autoconf and
 10641  implemented the new random functions.
 10642  
 10643  Kevin Ryde worked on a large number of things: optimized x86 code, m4 asm
 10644  macros, parameter tuning, speed measuring, the configure system, function
 10645  inlining, divisibility tests, bit scanning, Jacobi symbols, Fibonacci and Lucas
 10646  number functions, printf and scanf functions, perl interface, demo expression
 10647  parser, the algorithms chapter in the manual, @file{gmpasm-mode.el}, and
 10648  various miscellaneous improvements elsewhere.
 10649  
 10650  Kent Boortz made the Mac OS 9 port.
 10651  
 10652  Steve Root helped write the optimized alpha 21264 assembly code.
 10653  
 10654  Gerardo Ballabio wrote the @file{gmpxx.h} C++ class interface and the C++
 10655  @code{istream} input routines.
 10656  
 10657  Jason Moxham rewrote @code{mpz_fac_ui}.
 10658  
 10659  Pedro Gimeno implemented the Mersenne Twister and made other random number
 10660  improvements.
 10661  
 10662  Niels M@"oller wrote the sub-quadratic GCD, extended GCD and jacobi code, the
 10663  quadratic Hensel division code, and (with Torbj@"orn) the new divide and
 10664  conquer division code for GMP 4.3.  Niels also helped implement the new Toom
 10665  multiply code for GMP 4.3 and implemented helper functions to simplify Toom
 10666  evaluations for GMP 5.0.  He wrote the original version of mpn_mulmod_bnm1, and
 10667  he is the main author of the mini-gmp package used for gmp bootstrapping.
 10668  
 10669  Alberto Zanoni and Marco Bodrato suggested the unbalanced multiply strategy,
 10670  and found the optimal strategies for evaluation and interpolation in Toom
 10671  multiplication.
 10672  
 10673  Marco Bodrato helped implement the new Toom multiply code for GMP 4.3 and
 10674  implemented most of the new Toom multiply and squaring code for 5.0.
 10675  He is the main author of the current mpn_mulmod_bnm1, mpn_mullo_n, and
 10676  mpn_sqrlo.  Marco also wrote the functions mpn_invert and mpn_invertappr,
 10677  and improved the speed of integer root extraction.  He is the author of
 10678  the current combinatorial functions: binomial, factorial, multifactorial,
 10679  primorial.
 10680  
 10681  David Harvey suggested the internal function @code{mpn_bdiv_dbm1}, implementing
 10682  division relevant to Toom multiplication.  He also worked on fast assembly
 10683  sequences, in particular on a fast AMD64 @code{mpn_mul_basecase}. He wrote
 10684  the internal middle product functions @code{mpn_mulmid_basecase},
 10685  @code{mpn_toom42_mulmid}, @code{mpn_mulmid_n} and related helper routines.
 10686  
 10687  Martin Boij wrote @code{mpn_perfect_power_p}.
 10688  
 10689  Marc Glisse improved @file{gmpxx.h}: use fewer temporaries (faster),
 10690  specializations of @code{numeric_limits} and @code{common_type}, C++11
 10691  features (move constructors, explicit bool conversion, UDL), make the
 10692  conversion from @code{mpq_class} to @code{mpz_class} explicit, optimize
 10693  operations where one argument is a small compile-time constant, replace
 10694  some heap allocations by stack allocations.  He also fixed the eofbit
 10695  handling of C++ streams, and removed one division from @file{mpq/aors.c}.
 10696  
 10697  David S Miller wrote assembly code for SPARC T3 and T4.
 10698  
 10699  Mark Sofroniou cleaned up the types of mul_fft.c, letting it work for huge
 10700  operands.
 10701  
 10702  Ulrich Weigand ported GMP to the powerpc64le ABI.
 10703  
 10704  (This list is chronological, not ordered after significance.  If you have
 10705  contributed to GMP but are not listed above, please tell
 10706  @email{gmp-devel@@gmplib.org} about the omission!)
 10707  
 10708  The development of floating point functions of GNU MP 2, were supported in part
 10709  by the ESPRIT-BRA (Basic Research Activities) 6846 project POSSO (POlynomial
 10710  System SOlving).
 10711  
 10712  The development of GMP 2, 3, and 4.0 was supported in part by the IDA Center
 10713  for Computing Sciences.
 10714  
 10715  The development of GMP 4.3, 5.0, and 5.1 was supported in part by the Swedish
 10716  Foundation for Strategic Research.
 10717  
 10718  Thanks go to Hans Thorsen for donating an SGI system for the GMP test system
 10719  environment.
 10720  
 10721  @node References, GNU Free Documentation License, Contributors, Top
 10722  @comment  node-name,  next,  previous,  up
 10723  @appendix References
 10724  @cindex References
 10725  
 10726  @c  FIXME: In tex, the @uref's are unhyphenated, which is good for clarity,
 10727  @c  but being long words they upset paragraph formatting (the preceding line
 10728  @c  can get badly stretched).  Would like an conditional @* style line break
 10729  @c  if the uref is too long to fit on the last line of the paragraph, but it's
 10730  @c  not clear how to do that.  For now explicit @texlinebreak{}s are used on
 10731  @c  paragraphs that come out bad.
 10732  
 10733  @section Books
 10734  
 10735  @itemize @bullet
 10736  @item
 10737  Jonathan M. Borwein and Peter B. Borwein, ``Pi and the AGM: A Study in
 10738  Analytic Number Theory and Computational Complexity'', Wiley, 1998.
 10739  
 10740  @item
 10741  Richard Crandall and Carl Pomerance, ``Prime Numbers: A Computational
 10742  Perspective'', 2nd edition, Springer-Verlag, 2005.
 10743  @texlinebreak{} @uref{http://www.math.dartmouth.edu/~carlp/}
 10744  
 10745  @item
 10746  Henri Cohen, ``A Course in Computational Algebraic Number Theory'', Graduate
 10747  Texts in Mathematics number 138, Springer-Verlag, 1993.
 10748  @texlinebreak{} @uref{http://www.math.u-bordeaux.fr/~cohen/}
 10749  
 10750  @item
 10751  Donald E. Knuth, ``The Art of Computer Programming'', volume 2,
 10752  ``Seminumerical Algorithms'', 3rd edition, Addison-Wesley, 1998.
 10753  @texlinebreak{} @uref{http://www-cs-faculty.stanford.edu/~knuth/taocp.html}
 10754  
 10755  @item
 10756  John D. Lipson, ``Elements of Algebra and Algebraic Computing'',
 10757  The Benjamin Cummings Publishing Company Inc, 1981.
 10758  
 10759  @item
 10760  Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone, ``Handbook of
 10761  Applied Cryptography'', @uref{http://www.cacr.math.uwaterloo.ca/hac/}
 10762  
 10763  @item
 10764  Richard M. Stallman and the GCC Developer Community, ``Using the GNU Compiler
 10765  Collection'', Free Software Foundation, 2008, available online
 10766  @uref{https://gcc.gnu.org/onlinedocs/}, and in the GCC package
 10767  @uref{https://ftp.gnu.org/gnu/gcc/}
 10768  @end itemize
 10769  
 10770  @section Papers
 10771  
 10772  @itemize @bullet
 10773  @item
 10774  Yves Bertot, Nicolas Magaud and Paul Zimmermann, ``A Proof of GMP Square
 10775  Root'', Journal of Automated Reasoning, volume 29, 2002, pp.@: 225-252.  Also
 10776  available online as INRIA Research Report 4475, June 2002,
 10777  @uref{http://hal.inria.fr/docs/00/07/21/13/PDF/RR-4475.pdf}
 10778  
 10779  @item
 10780  Christoph Burnikel and Joachim Ziegler, ``Fast Recursive Division'',
 10781  Max-Planck-Institut fuer Informatik Research Report MPI-I-98-1-022,
 10782  @texlinebreak{} @uref{http://data.mpi-sb.mpg.de/internet/reports.nsf/NumberView/1998-1-022}
 10783  
 10784  @item
 10785  Torbj@"orn Granlund and Peter L. Montgomery, ``Division by Invariant Integers
 10786  using Multiplication'', in Proceedings of the SIGPLAN PLDI'94 Conference, June
 10787  1994.  Also available @uref{https://gmplib.org/~tege/divcnst-pldi94.pdf}.
 10788  
 10789  @item
 10790  Niels M@"oller and Torbj@"orn Granlund, ``Improved division by invariant
 10791  integers'', IEEE Transactions on Computers, 11 June 2010.
 10792  @uref{https://gmplib.org/~tege/division-paper.pdf}
 10793  
 10794  @item
 10795  Torbj@"orn Granlund and Niels M@"oller, ``Division of integers large and
 10796  small'', to appear.
 10797  
 10798  @item
 10799  Tudor Jebelean,
 10800  ``An algorithm for exact division'',
 10801  Journal of Symbolic Computation,
 10802  volume 15, 1993, pp.@: 169-180.
 10803  Research report version available @texlinebreak{}
 10804  @uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-35.ps.gz}
 10805  
 10806  @item
 10807  Tudor Jebelean, ``Exact Division with Karatsuba Complexity - Extended
 10808  Abstract'', RISC-Linz technical report 96-31, @texlinebreak{}
 10809  @uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-31.ps.gz}
 10810  
 10811  @item
 10812  Tudor Jebelean, ``Practical Integer Division with Karatsuba Complexity'',
 10813  ISSAC 97, pp.@: 339-341.  Technical report available @texlinebreak{}
 10814  @uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-29.ps.gz}
 10815  
 10816  @item
 10817  Tudor Jebelean, ``A Generalization of the Binary GCD Algorithm'', ISSAC 93,
 10818  pp.@: 111-116.  Technical report version available @texlinebreak{}
 10819  @uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-01.ps.gz}
 10820  
 10821  @item
 10822  Tudor Jebelean, ``A Double-Digit Lehmer-Euclid Algorithm for Finding the GCD
 10823  of Long Integers'', Journal of Symbolic Computation, volume 19, 1995,
 10824  pp.@: 145-157.  Technical report version also available @texlinebreak{}
 10825  @uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-69.ps.gz}
 10826  
 10827  @item
 10828  Werner Krandick and Tudor Jebelean, ``Bidirectional Exact Integer Division'',
 10829  Journal of Symbolic Computation, volume 21, 1996, pp.@: 441-455.  Early
 10830  technical report version also available
 10831  @uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1994/94-50.ps.gz}
 10832  
 10833  @item
 10834  Makoto Matsumoto and Takuji Nishimura, ``Mersenne Twister: A 623-dimensionally
 10835  equidistributed uniform pseudorandom number generator'', ACM Transactions on
 10836  Modelling and Computer Simulation, volume 8, January 1998, pp.@: 3-30.
 10837  Available online @texlinebreak{}
 10838  @uref{http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/mt.ps.gz} (or .pdf)
 10839  
 10840  @item
 10841  R. Moenck and A. Borodin, ``Fast Modular Transforms via Division'',
 10842  Proceedings of the 13th Annual IEEE Symposium on Switching and Automata
 10843  Theory, October 1972, pp.@: 90-96.  Reprinted as ``Fast Modular Transforms'',
 10844  Journal of Computer and System Sciences, volume 8, number 3, June 1974,
 10845  pp.@: 366-386.
 10846  
 10847  @item
 10848  Niels M@"oller, ``On Sch@"onhage's algorithm and subquadratic integer GCD
 10849    computation'', in Mathematics of Computation, volume 77, January 2008, pp.@:
 10850    589-607.
 10851  
 10852  @item
 10853  Peter L. Montgomery, ``Modular Multiplication Without Trial Division'', in
 10854  Mathematics of Computation, volume 44, number 170, April 1985.
 10855  
 10856  @item
 10857  Arnold Sch@"onhage and Volker Strassen, ``Schnelle Multiplikation grosser
 10858  Zahlen'', Computing 7, 1971, pp.@: 281-292.
 10859  
 10860  @item
 10861  Kenneth Weber, ``The accelerated integer GCD algorithm'',
 10862  ACM Transactions on Mathematical Software,
 10863  volume 21, number 1, March 1995, pp.@: 111-122.
 10864  
 10865  @item
 10866  Paul Zimmermann, ``Karatsuba Square Root'', INRIA Research Report 3805,
 10867  November 1999, @uref{http://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf}
 10868  
 10869  @item
 10870  Paul Zimmermann, ``A Proof of GMP Fast Division and Square Root
 10871  Implementations'', @texlinebreak{}
 10872  @uref{http://www.loria.fr/~zimmerma/papers/proof-div-sqrt.ps.gz}
 10873  
 10874  @item
 10875  Dan Zuras, ``On Squaring and Multiplying Large Integers'', ARITH-11: IEEE
 10876  Symposium on Computer Arithmetic, 1993, pp.@: 260 to 271.  Reprinted as ``More
 10877  on Multiplying and Squaring Large Integers'', IEEE Transactions on Computers,
 10878  volume 43, number 8, August 1994, pp.@: 899-908.
 10879  @end itemize
 10880  
 10881  
 10882  @node GNU Free Documentation License, Concept Index, References, Top
 10883  @appendix GNU Free Documentation License
 10884  @cindex GNU Free Documentation License
 10885  @cindex Free Documentation License
 10886  @cindex Documentation license
 10887  @include fdl-1.3.texi
 10888  
 10889  
 10890  @node Concept Index, Function Index, GNU Free Documentation License, Top
 10891  @comment  node-name,  next,  previous,  up
 10892  @unnumbered Concept Index
 10893  @printindex cp
 10894  
 10895  @node Function Index,  , Concept Index, Top
 10896  @comment  node-name,  next,  previous,  up
 10897  @unnumbered Function and Type Index
 10898  @printindex fn
 10899  
 10900  @bye
 10901  
 10902  @c Local variables:
 10903  @c fill-column: 78
 10904  @c compile-command: "make gmp.info"
 10905  @c End: