github.com/aergoio/aergo@v1.3.1/libtool/src/gmp-6.1.2/doc/projects.html (about)

     1  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
     2  <html>
     3  <head>
     4    <title>GMP Development Projects</title>
     5    <link rel="shortcut icon" href="favicon.ico">
     6    <link rel="stylesheet" href="gmp.css">
     7    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
     8  </head>
     9  
    10  <center>
    11    <h1>
    12      GMP Development Projects
    13    </h1>
    14  </center>
    15  
    16  <font size=-1>
    17  <pre>
    18  Copyright 2000-2006, 2008-2011 Free Software Foundation, Inc.
    19  
    20  This file is part of the GNU MP Library.
    21  
    22  The GNU MP Library is free software; you can redistribute it and/or modify
    23  it under the terms of either:
    24  
    25    * the GNU Lesser General Public License as published by the Free
    26      Software Foundation; either version 3 of the License, or (at your
    27      option) any later version.
    28  
    29  or
    30  
    31    * the GNU General Public License as published by the Free Software
    32      Foundation; either version 2 of the License, or (at your option) any
    33      later version.
    34  
    35  or both in parallel, as here.
    36  
    37  The GNU MP Library is distributed in the hope that it will be useful, but
    38  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
    39  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
    40  for more details.
    41  
    42  You should have received copies of the GNU General Public License and the
    43  GNU Lesser General Public License along with the GNU MP Library.  If not,
    44  see https://www.gnu.org/licenses/.
    45  </pre>
    46  </font>
    47  
    48  <hr>
    49  <!-- NB. timestamp updated automatically by emacs -->
    50    This file current as of 29 Jan 2014.  An up-to-date version is available at
    51    <a href="https://gmplib.org/projects.html">https://gmplib.org/projects.html</a>.
    52    Please send comments about this page to gmp-devel<font>@</font>gmplib.org.
    53  
    54  <p> This file lists projects suitable for volunteers.  Please see the
    55      <a href="tasks.html">tasks file</a> for smaller tasks.
    56  
    57  <p> If you want to work on any of the projects below, please let
    58      gmp-devel<font>@</font>gmplib.org know.  If you want to help with a project
    59      that already somebody else is working on, you will get in touch through
    60      gmp-devel<font>@</font>gmplib.org.  (There are no email addresses of
    61      volunteers below, due to spamming problems.)
    62  
    63  <ul>
    64  <li> <strong>Faster multiplication</strong>
    65  
    66    <ol>
    67  
    68      <li> Work on the algorithm selection code for unbalanced multiplication.
    69  
    70      <li> Implement an FFT variant computing the coefficients mod m different
    71  	 limb size primes of the form l*2^k+1. i.e., compute m separate FFTs.
    72  	 The wanted coefficients will at the end be found by lifting with CRT
    73  	 (Chinese Remainder Theorem).  If we let m = 3, i.e., use 3 primes, we
    74  	 can split the operands into coefficients at limb boundaries, and if
    75  	 our machine uses b-bit limbs, we can multiply numbers with close to
    76  	 2^b limbs without coefficient overflow.  For smaller multiplication,
    77  	 we might perhaps let m = 1, and instead of splitting our operands at
    78  	 limb boundaries, split them in much smaller pieces.  We might also use
    79  	 4 or more primes, and split operands into bigger than b-bit chunks.
    80  	 By using more primes, the gain in shorter transform length, but lose
    81  	 in having to do more FFTs, but that is a slight total save.  We then
    82  	 lose in more expensive CRT. <br><br>
    83  
    84  	 <p> [We now have two implementations of this algorithm, one by Tommy
    85  	 Färnqvist and one by Niels Möller.]
    86  
    87      <li> Work on short products.  Our mullo and mulmid are probably K, but we
    88           lack mulhi.
    89  
    90    </ol>
    91  
    92    <p> Another possibility would be an optimized cube.  In the basecase that
    93        should definitely be able to save cross-products in a similar fashion to
    94        squaring, but some investigation might be needed for how best to adapt
    95        the higher-order algorithms.  Not sure whether cubing or further small
    96        powers have any particularly important uses though.
    97  
    98  
    99  <li> <strong>Assembly routines</strong>
   100  
   101    <p> Write new and improve existing assembly routines.  The tests/devel
   102        programs and the tune/speed.c and tune/many.pl programs are useful for
   103        testing and timing the routines you write.  See the README files in those
   104        directories for more information.
   105  
   106    <p> Please make sure your new routines are fast for these three situations:
   107        <ol>
   108  	<li> Small operands of less than, say, 10 limbs.
   109  	<li> Medium size operands, that fit into the cache.
   110  	<li> Huge operands that does not fit into the cache.
   111        </ol>
   112  
   113    <p> The most important routines are mpn_addmul_1, mpn_mul_basecase and
   114        mpn_sqr_basecase.  The latter two don't exist for all machines, while
   115        mpn_addmul_1 exists for almost all machines.
   116  
   117    <p> Standard techniques for these routines are unrolling, software
   118        pipelining, and specialization for common operand values.  For machines
   119        with poor integer multiplication, it is sometimes possible to remedy the
   120        situation using floating-point operations or SIMD operations such as MMX
   121        (x86) (x86), SSE (x86), VMX (PowerPC), VIS (Sparc).
   122  
   123    <p> Using floating-point operations is interesting but somewhat tricky.
   124        Since IEEE double has 53 bit of mantissa, one has to split the operands
   125        in small pieces, so that no intermediates are greater than 2^53.  For
   126        32-bit computers, splitting one operand into 16-bit pieces works.  For
   127        64-bit machines, one operand can be split into 21-bit pieces and the
   128        other into 32-bit pieces.  (A 64-bit operand can be split into just three
   129        21-bit pieces if one allows the split operands to be negative!)
   130  
   131  
   132  <li> <strong>Faster sqrt</strong>
   133  
   134    <p> The current code uses divisions, which are reasonably fast, but it'd be
   135        possible to use only multiplications by computing 1/sqrt(A) using this
   136        iteration:
   137        <pre>
   138  				    2
   139  		   x   = x  (3 &minus; A x )/2
   140  		    i+1	  i	    i  </pre>
   141        The square root can then be computed like this:
   142        <pre>
   143  		     sqrt(A) = A x
   144  				  n  </pre>
   145    <p> That final multiply might be the full size of the input (though it might
   146        only need the high half of that), so there may or may not be any speedup
   147        overall.
   148  
   149    <p> We should probably allow a special exponent-like parameter, to speed
   150        computations of a precise square root of a small number in mpf and mpfr.
   151  
   152  
   153  <li> <strong>Nth root</strong>
   154  
   155    <p> Improve mpn_rootrem.  The current code is not too bad, but its time
   156        complexity is a function of the input, while it is possible to make
   157        the <i>average</i> complexity a function of the output.
   158  
   159  
   160  <li> <strong>Fat binaries</strong>
   161  
   162    <p> Add more functions to the set of fat functions.
   163  
   164    <p> The speed of multiplication is today highly dependent on combination
   165    functions like <code>addlsh1_n</code>.  A fat binary will never use any such
   166    functions, since they are classified as optional.  Ideally, we should use
   167    them, but making the current compile-time selections of optional functions
   168    become run-time selections for fat binaries.
   169  
   170    <p> If we make fat binaries work really well, we should move away frm tehe
   171    current configure scheme (at least by default) and instead include all code
   172    always.
   173  
   174  
   175  <li> <strong>Exceptions</strong>
   176  
   177    <p> Some sort of scheme for exceptions handling would be desirable.
   178        Presently the only thing documented is that divide by zero in GMP
   179        functions provokes a deliberate machine divide by zero (on those systems
   180        where such a thing exists at least).  The global <code>gmp_errno</code>
   181        is not actually documented, except for the old <code>gmp_randinit</code>
   182        function.  Being currently just a plain global means it's not
   183        thread-safe.
   184  
   185    <p> The basic choices for exceptions are returning an error code or having a
   186        handler function to be called.  The disadvantage of error returns is they
   187        have to be checked, leading to tedious and rarely executed code, and
   188        strictly speaking such a scheme wouldn't be source or binary compatible.
   189        The disadvantage of a handler function is that a <code>longjmp</code> or
   190        similar recovery from it may be difficult.  A combination would be
   191        possible, for instance by allowing the handler to return an error code.
   192  
   193    <p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can
   194        normally be detected at the start of functions, so exception handling
   195        would have a clean state.  What's worth considering though is that the
   196        GMP function detecting the exception may have been called via some third
   197        party library or self contained application module, and hence have
   198        various bits of state to be cleaned up above it.  It'd be highly
   199        desirable for an exceptions scheme to allow for such cleanups.
   200  
   201    <p> The C++ destructor mechanism could help with cleanups both internally and
   202        externally, but being a plain C library we don't want to depend on that.
   203  
   204    <p> A C++ <code>throw</code> might be a good optional extra exceptions
   205        mechanism, perhaps under a build option.  For
   206        GCC <code>-fexceptions</code> will add the necessary frame information to
   207        plain C code, or GMP could be compiled as C++.
   208  
   209    <p> Out-of-memory exceptions are expected to be handled by the
   210        <code>mp_set_memory_functions</code> routines, rather than being a
   211        prospective part of divide-by-zero etc.  Some similar considerations
   212        apply but what differs is that out-of-memory can arise deep within GMP
   213        internals.  Even fundamental routines like <code>mpn_add_n</code> and
   214        <code>mpn_addmul_1</code> can use temporary memory (for instance on Cray
   215        vector systems).  Allowing for an error code return would require an
   216        awful lot of checking internally.  Perhaps it'd still be worthwhile, but
   217        it'd be a lot of changes and the extra code would probably be rather
   218        rarely executed in normal usages.
   219  
   220    <p> A <code>longjmp</code> recovery for out-of-memory will currently, in
   221        general, lead to memory leaks and may leave GMP variables operated on in
   222        inconsistent states.  Maybe it'd be possible to record recovery
   223        information for use by the relevant allocate or reallocate function, but
   224        that too would be a lot of changes.
   225  
   226    <p> One scheme for out-of-memory would be to note that all GMP allocations go
   227        through the <code>mp_set_memory_functions</code> routines.  So if the
   228        application has an intended <code>setjmp</code> recovery point it can
   229        record memory activity by GMP and abandon space allocated and variables
   230        initialized after that point.  This might be as simple as directing the
   231        allocation functions to a separate pool, but in general would have the
   232        disadvantage of needing application-level bookkeeping on top of the
   233        normal system <code>malloc</code>.  An advantage however is that it needs
   234        nothing from GMP itself and on that basis doesn't burden applications not
   235        needing recovery.  Note that there's probably some details to be worked
   236        out here about reallocs of existing variables, and perhaps about copying
   237        or swapping between "permanent" and "temporary" variables.
   238  
   239    <p> Applications desiring a fine-grained error control, for instance a
   240        language interpreter, would very possibly not be well served by a scheme
   241        requiring <code>longjmp</code>.  Wrapping every GMP function call with a
   242        <code>setjmp</code> would be very inconvenient.
   243  
   244    <p> Another option would be to let <code>mpz_t</code> etc hold a sort of NaN,
   245        a special value indicating an out-of-memory or other failure.  This would
   246        be similar to NaNs in mpfr.  Unfortunately such a scheme could only be
   247        used by programs prepared to handle such special values, since for
   248        instance a program waiting for some condition to be satisfied could
   249        become an infinite loop if it wasn't also watching for NaNs.  The work to
   250        implement this would be significant too, lots of checking of inputs and
   251        intermediate results.  And if <code>mpn</code> routines were to
   252        participate in this (which they would have to internally) a lot of new
   253        return values would need to be added, since of course there's no
   254        <code>mpz_t</code> etc structure for them to indicate failure in.
   255  
   256    <p> Stack overflow is another possible exception, but perhaps not one that
   257        can be easily detected in general.  On i386 GNU/Linux for instance GCC
   258        normally doesn't generate stack probes for an <code>alloca</code>, but
   259        merely adjusts <code>%esp</code>.  A big enough <code>alloca</code> can
   260        miss the stack redzone and hit arbitrary data.  GMP stack usage is
   261        normally a function of operand size, which might be enough for some
   262        applications to know they'll be safe.  Otherwise a fixed maximum usage
   263        can probably be obtained by building with
   264        <code>--enable-alloca=malloc-reentrant</code> (or
   265        <code>notreentrant</code>).  Arranging the default to be
   266        <code>alloca</code> only on blocks up to a certain size and
   267        <code>malloc</code> thereafter might be a better approach and would have
   268        the advantage of not having calculations limited by available stack.
   269  
   270    <p> Actually recovering from stack overflow is of course another problem.  It
   271        might be possible to catch a <code>SIGSEGV</code> in the stack redzone
   272        and do something in a <code>sigaltstack</code>, on systems which have
   273        that, but recovery might otherwise not be possible.  This is worth
   274        bearing in mind because there's no point worrying about tight and careful
   275        out-of-memory recovery if an out-of-stack is fatal.
   276  
   277    <p> Operand overflow is another exception to be addressed.  It's easy for
   278        instance to ask <code>mpz_pow_ui</code> for a result bigger than an
   279        <code>mpz_t</code> can possibly represent.  Currently overflows in limb
   280        or byte count calculations will go undetected.  Often they'll still end
   281        up asking the memory functions for blocks bigger than available memory,
   282        but that's by no means certain and results are unpredictable in general.
   283        It'd be desirable to tighten up such size calculations.  Probably only
   284        selected routines would need checks, if it's assumed say that no input
   285        will be more than half of all memory and hence size additions like say
   286        <code>mpz_mul</code> won't overflow.
   287  
   288  
   289  <li> <strong>Performance Tool</strong>
   290  
   291    <p> It'd be nice to have some sort of tool for getting an overview of
   292        performance.  Clearly a great many things could be done, but some primary
   293        uses would be,
   294  
   295        <ol>
   296  	<li> Checking speed variations between compilers.
   297  	<li> Checking relative performance between systems or CPUs.
   298        </ol>
   299  
   300    <p> A combination of measuring some fundamental routines and some
   301        representative application routines might satisfy these.
   302  
   303    <p> The tune/time.c routines would be the easiest way to get good accurate
   304        measurements on lots of different systems.  The high level
   305        <code>speed_measure</code> may or may not suit, but the basic
   306        <code>speed_starttime</code> and <code>speed_endtime</code> would cover
   307        lots of portability and accuracy questions.
   308  
   309  
   310  <li> <strong>Using <code>restrict</code></strong>
   311  
   312    <p> There might be some value in judicious use of C99 style
   313        <code>restrict</code> on various pointers, but this would need some
   314        careful thought about what it implies for the various operand overlaps
   315        permitted in GMP.
   316  
   317    <p> Rumour has it some pre-C99 compilers had <code>restrict</code>, but
   318        expressing tighter (or perhaps looser) requirements.  Might be worth
   319        investigating that before using <code>restrict</code> unconditionally.
   320  
   321    <p> Loops are presumably where the greatest benefit would be had, by allowing
   322        the compiler to advance reads ahead of writes, perhaps as part of loop
   323        unrolling.  However critical loops are generally coded in assembler, so
   324        there might not be very much to gain.  And on Cray systems the explicit
   325        use of <code>_Pragma</code> gives an equivalent effect.
   326  
   327    <p> One thing to note is that Microsoft C headers (on ia64 at least) contain
   328        <code>__declspec(restrict)</code>, so a <code>#define</code> of
   329        <code>restrict</code> should be avoided.  It might be wisest to setup a
   330        <code>gmp_restrict</code>.
   331  
   332  
   333  <li> <strong>Prime Testing</strong>
   334  
   335    <p> GMP is not really a number theory library and probably shouldn't have
   336        large amounts of code dedicated to sophisticated prime testing
   337        algorithms, but basic things well-implemented would suit.  Tests offering
   338        certainty are probably all too big or too slow (or both!) to justify
   339        inclusion in the main library.  Demo programs showing some possibilities
   340        would be good though.
   341  
   342    <p> The present "repetitions" argument to <code>mpz_probab_prime_p</code> is
   343        rather specific to the Miller-Rabin tests of the current implementation.
   344        Better would be some sort of parameter asking perhaps for a maximum
   345        chance 1/2^x of a probable prime in fact being composite.  If
   346        applications follow the advice that the present reps gives 1/4^reps
   347        chance then perhaps such a change is unnecessary, but an explicitly
   348        described 1/2^x would allow for changes in the implementation or even for
   349        new proofs about the theory.
   350  
   351    <p> <code>mpz_probab_prime_p</code> always initializes a new
   352        <code>gmp_randstate_t</code> for randomized tests, which unfortunately
   353        means it's not really very random and in particular always runs the same
   354        tests for a given input.  Perhaps a new interface could accept an rstate
   355        to use, so successive tests could increase confidence in the result.
   356  
   357    <p> <code>mpn_mod_34lsub1</code> is an obvious and easy improvement to the
   358        trial divisions.  And since the various prime factors are constants, the
   359        remainder can be tested with something like
   360  <pre>
   361  #define MP_LIMB_DIVISIBLE_7_P(n) \
   362    ((n) * MODLIMB_INVERSE_7 &lt;= MP_LIMB_T_MAX/7)
   363  </pre>
   364        Which would help compilers that don't know how to optimize divisions by
   365        constants, and is even an improvement on current gcc 3.2 code.  This
   366        technique works for any modulus, see Granlund and Montgomery "Division by
   367        Invariant Integers" section 9.
   368  
   369    <p> The trial divisions are done with primes generated and grouped at
   370        runtime.  This could instead be a table of data, with pre-calculated
   371        inverses too.  Storing deltas, ie. amounts to add, rather than actual
   372        primes would save space.  <code>udiv_qrnnd_preinv</code> style inverses
   373        can be made to exist by adding dummy factors of 2 if necessary.  Some
   374        thought needs to be given as to how big such a table should be, based on
   375        how much dividing would be profitable for what sort of size inputs.  The
   376        data could be shared by the perfect power testing.
   377  
   378    <p> Jason Moxham points out that if a sqrt(-1) mod N exists then any factor
   379        of N must be == 1 mod 4, saving half the work in trial dividing.  (If
   380        x^2==-1 mod N then for a prime factor p we have x^2==-1 mod p and so the
   381        jacobi symbol (-1/p)=1.  But also (-1/p)=(-1)^((p-1)/2), hence must have
   382        p==1 mod 4.)  But knowing whether sqrt(-1) mod N exists is not too easy.
   383        A strong pseudoprime test can reveal one, so perhaps such a test could be
   384        inserted part way though the dividing.
   385  
   386    <p> Jon Grantham "Frobenius Pseudoprimes" (www.pseudoprime.com) describes a
   387        quadratic pseudoprime test taking about 3x longer than a plain test, but
   388        with only a 1/7710 chance of error (whereas 3 plain Miller-Rabin tests
   389        would offer only (1/4)^3 == 1/64).  Such a test needs completely random
   390        parameters to satisfy the theory, though single-limb values would run
   391        faster.  It's probably best to do at least one plain Miller-Rabin before
   392        any quadratic tests, since that can identify composites in less total
   393        time.
   394  
   395    <p> Some thought needs to be given to the structure of which tests (trial
   396        division, Miller-Rabin, quadratic) and how many are done, based on what
   397        sort of inputs we expect, with a view to minimizing average time.
   398  
   399    <p> It might be a good idea to break out subroutines for the various tests,
   400        so that an application can combine them in ways it prefers, if sensible
   401        defaults in <code>mpz_probab_prime_p</code> don't suit.  In particular
   402        this would let applications skip tests it knew would be unprofitable,
   403        like trial dividing when an input is already known to have no small
   404        factors.
   405  
   406    <p> For small inputs, combinations of theory and explicit search make it
   407        relatively easy to offer certainty.  For instance numbers up to 2^32
   408        could be handled with a strong pseudoprime test and table lookup.  But
   409        it's rather doubtful whether a smallnum prime test belongs in a bignum
   410        library.  Perhaps if it had other internal uses.
   411  
   412    <p> An <code>mpz_nthprime</code> might be cute, but is almost certainly
   413        impractical for anything but small n.
   414  
   415  
   416  <li> <strong>Intra-Library Calls</strong>
   417  
   418    <p> On various systems, calls within libgmp still go through the PLT, TOC or
   419        other mechanism, which makes the code bigger and slower than it needs to
   420        be.
   421  
   422    <p> The theory would be to have all GMP intra-library calls resolved directly
   423        to the routines in the library.  An application wouldn't be able to
   424        replace a routine, the way it can normally, but there seems no good
   425        reason to do that, in normal circumstances.
   426  
   427    <p> The <code>visibility</code> attribute in recent gcc is good for this,
   428        because it lets gcc omit unnecessary GOT pointer setups or whatever if it
   429        finds all calls are local and there's no global data references.
   430        Documented entrypoints would be <code>protected</code>, and purely
   431        internal things not wanted by test programs or anything can be
   432        <code>internal</code>.
   433  
   434    <p> Unfortunately, on i386 it seems <code>protected</code> ends up causing
   435        text segment relocations within libgmp.so, meaning the library code can't
   436        be shared between processes, defeating the purpose of a shared library.
   437        Perhaps this is just a gremlin in binutils (debian packaged
   438        2.13.90.0.16-1).
   439  
   440    <p> The linker can be told directly (with a link script, or options) to do
   441        the same sort of thing.  This doesn't change the code emitted by gcc of
   442        course, but it does mean calls are resolved directly to their targets,
   443        avoiding a PLT entry.
   444  
   445    <p> Keeping symbols private to libgmp.so is probably a good thing in general
   446        too, to stop anyone even attempting to access them.  But some
   447        undocumented things will need or want to be kept visible, for use by
   448        mpfr, or the test and tune programs.  Libtool has a standard option for
   449        selecting public symbols (used now for libmp).
   450  
   451  
   452  <li> <strong>Math functions for the mpf layer</strong>
   453  
   454    <p> Implement the functions of math.h for the GMP mpf layer!	Check the book
   455        "Pi and the AGM" by Borwein and Borwein for ideas how to do this.  These
   456        functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2,
   457        cos, cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
   458  
   459    <p> Note that the <a href="http://mpfr.org">mpfr</a> functions already
   460    provide these functions, and that we usually recommend new programs to use
   461    mpfr instead of mpf.
   462  </ul>
   463  <hr>
   464  
   465  </body>
   466  </html>
   467  
   468  <!--
   469  Local variables:
   470  eval: (add-hook 'write-file-hooks 'time-stamp)
   471  time-stamp-start: "This file current as of "
   472  time-stamp-format: "%:d %3b %:y"
   473  time-stamp-end: "\\."
   474  time-stamp-line-limit: 50
   475  End:
   476  -->