github.com/aergoio/aergo@v1.3.1/libtool/src/gmp-6.1.2/doc/projects.html (about) 1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 2 <html> 3 <head> 4 <title>GMP Development Projects</title> 5 <link rel="shortcut icon" href="favicon.ico"> 6 <link rel="stylesheet" href="gmp.css"> 7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 8 </head> 9 10 <center> 11 <h1> 12 GMP Development Projects 13 </h1> 14 </center> 15 16 <font size=-1> 17 <pre> 18 Copyright 2000-2006, 2008-2011 Free Software Foundation, Inc. 19 20 This file is part of the GNU MP Library. 21 22 The GNU MP Library is free software; you can redistribute it and/or modify 23 it under the terms of either: 24 25 * the GNU Lesser General Public License as published by the Free 26 Software Foundation; either version 3 of the License, or (at your 27 option) any later version. 28 29 or 30 31 * the GNU General Public License as published by the Free Software 32 Foundation; either version 2 of the License, or (at your option) any 33 later version. 34 35 or both in parallel, as here. 36 37 The GNU MP Library is distributed in the hope that it will be useful, but 38 WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 39 or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 40 for more details. 41 42 You should have received copies of the GNU General Public License and the 43 GNU Lesser General Public License along with the GNU MP Library. If not, 44 see https://www.gnu.org/licenses/. 45 </pre> 46 </font> 47 48 <hr> 49 <!-- NB. timestamp updated automatically by emacs --> 50 This file current as of 29 Jan 2014. An up-to-date version is available at 51 <a href="https://gmplib.org/projects.html">https://gmplib.org/projects.html</a>. 52 Please send comments about this page to gmp-devel<font>@</font>gmplib.org. 53 54 <p> This file lists projects suitable for volunteers. Please see the 55 <a href="tasks.html">tasks file</a> for smaller tasks. 56 57 <p> If you want to work on any of the projects below, please let 58 gmp-devel<font>@</font>gmplib.org know. If you want to help with a project 59 that already somebody else is working on, you will get in touch through 60 gmp-devel<font>@</font>gmplib.org. (There are no email addresses of 61 volunteers below, due to spamming problems.) 62 63 <ul> 64 <li> <strong>Faster multiplication</strong> 65 66 <ol> 67 68 <li> Work on the algorithm selection code for unbalanced multiplication. 69 70 <li> Implement an FFT variant computing the coefficients mod m different 71 limb size primes of the form l*2^k+1. i.e., compute m separate FFTs. 72 The wanted coefficients will at the end be found by lifting with CRT 73 (Chinese Remainder Theorem). If we let m = 3, i.e., use 3 primes, we 74 can split the operands into coefficients at limb boundaries, and if 75 our machine uses b-bit limbs, we can multiply numbers with close to 76 2^b limbs without coefficient overflow. For smaller multiplication, 77 we might perhaps let m = 1, and instead of splitting our operands at 78 limb boundaries, split them in much smaller pieces. We might also use 79 4 or more primes, and split operands into bigger than b-bit chunks. 80 By using more primes, the gain in shorter transform length, but lose 81 in having to do more FFTs, but that is a slight total save. We then 82 lose in more expensive CRT. <br><br> 83 84 <p> [We now have two implementations of this algorithm, one by Tommy 85 Färnqvist and one by Niels Möller.] 86 87 <li> Work on short products. Our mullo and mulmid are probably K, but we 88 lack mulhi. 89 90 </ol> 91 92 <p> Another possibility would be an optimized cube. In the basecase that 93 should definitely be able to save cross-products in a similar fashion to 94 squaring, but some investigation might be needed for how best to adapt 95 the higher-order algorithms. Not sure whether cubing or further small 96 powers have any particularly important uses though. 97 98 99 <li> <strong>Assembly routines</strong> 100 101 <p> Write new and improve existing assembly routines. The tests/devel 102 programs and the tune/speed.c and tune/many.pl programs are useful for 103 testing and timing the routines you write. See the README files in those 104 directories for more information. 105 106 <p> Please make sure your new routines are fast for these three situations: 107 <ol> 108 <li> Small operands of less than, say, 10 limbs. 109 <li> Medium size operands, that fit into the cache. 110 <li> Huge operands that does not fit into the cache. 111 </ol> 112 113 <p> The most important routines are mpn_addmul_1, mpn_mul_basecase and 114 mpn_sqr_basecase. The latter two don't exist for all machines, while 115 mpn_addmul_1 exists for almost all machines. 116 117 <p> Standard techniques for these routines are unrolling, software 118 pipelining, and specialization for common operand values. For machines 119 with poor integer multiplication, it is sometimes possible to remedy the 120 situation using floating-point operations or SIMD operations such as MMX 121 (x86) (x86), SSE (x86), VMX (PowerPC), VIS (Sparc). 122 123 <p> Using floating-point operations is interesting but somewhat tricky. 124 Since IEEE double has 53 bit of mantissa, one has to split the operands 125 in small pieces, so that no intermediates are greater than 2^53. For 126 32-bit computers, splitting one operand into 16-bit pieces works. For 127 64-bit machines, one operand can be split into 21-bit pieces and the 128 other into 32-bit pieces. (A 64-bit operand can be split into just three 129 21-bit pieces if one allows the split operands to be negative!) 130 131 132 <li> <strong>Faster sqrt</strong> 133 134 <p> The current code uses divisions, which are reasonably fast, but it'd be 135 possible to use only multiplications by computing 1/sqrt(A) using this 136 iteration: 137 <pre> 138 2 139 x = x (3 − A x )/2 140 i+1 i i </pre> 141 The square root can then be computed like this: 142 <pre> 143 sqrt(A) = A x 144 n </pre> 145 <p> That final multiply might be the full size of the input (though it might 146 only need the high half of that), so there may or may not be any speedup 147 overall. 148 149 <p> We should probably allow a special exponent-like parameter, to speed 150 computations of a precise square root of a small number in mpf and mpfr. 151 152 153 <li> <strong>Nth root</strong> 154 155 <p> Improve mpn_rootrem. The current code is not too bad, but its time 156 complexity is a function of the input, while it is possible to make 157 the <i>average</i> complexity a function of the output. 158 159 160 <li> <strong>Fat binaries</strong> 161 162 <p> Add more functions to the set of fat functions. 163 164 <p> The speed of multiplication is today highly dependent on combination 165 functions like <code>addlsh1_n</code>. A fat binary will never use any such 166 functions, since they are classified as optional. Ideally, we should use 167 them, but making the current compile-time selections of optional functions 168 become run-time selections for fat binaries. 169 170 <p> If we make fat binaries work really well, we should move away frm tehe 171 current configure scheme (at least by default) and instead include all code 172 always. 173 174 175 <li> <strong>Exceptions</strong> 176 177 <p> Some sort of scheme for exceptions handling would be desirable. 178 Presently the only thing documented is that divide by zero in GMP 179 functions provokes a deliberate machine divide by zero (on those systems 180 where such a thing exists at least). The global <code>gmp_errno</code> 181 is not actually documented, except for the old <code>gmp_randinit</code> 182 function. Being currently just a plain global means it's not 183 thread-safe. 184 185 <p> The basic choices for exceptions are returning an error code or having a 186 handler function to be called. The disadvantage of error returns is they 187 have to be checked, leading to tedious and rarely executed code, and 188 strictly speaking such a scheme wouldn't be source or binary compatible. 189 The disadvantage of a handler function is that a <code>longjmp</code> or 190 similar recovery from it may be difficult. A combination would be 191 possible, for instance by allowing the handler to return an error code. 192 193 <p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can 194 normally be detected at the start of functions, so exception handling 195 would have a clean state. What's worth considering though is that the 196 GMP function detecting the exception may have been called via some third 197 party library or self contained application module, and hence have 198 various bits of state to be cleaned up above it. It'd be highly 199 desirable for an exceptions scheme to allow for such cleanups. 200 201 <p> The C++ destructor mechanism could help with cleanups both internally and 202 externally, but being a plain C library we don't want to depend on that. 203 204 <p> A C++ <code>throw</code> might be a good optional extra exceptions 205 mechanism, perhaps under a build option. For 206 GCC <code>-fexceptions</code> will add the necessary frame information to 207 plain C code, or GMP could be compiled as C++. 208 209 <p> Out-of-memory exceptions are expected to be handled by the 210 <code>mp_set_memory_functions</code> routines, rather than being a 211 prospective part of divide-by-zero etc. Some similar considerations 212 apply but what differs is that out-of-memory can arise deep within GMP 213 internals. Even fundamental routines like <code>mpn_add_n</code> and 214 <code>mpn_addmul_1</code> can use temporary memory (for instance on Cray 215 vector systems). Allowing for an error code return would require an 216 awful lot of checking internally. Perhaps it'd still be worthwhile, but 217 it'd be a lot of changes and the extra code would probably be rather 218 rarely executed in normal usages. 219 220 <p> A <code>longjmp</code> recovery for out-of-memory will currently, in 221 general, lead to memory leaks and may leave GMP variables operated on in 222 inconsistent states. Maybe it'd be possible to record recovery 223 information for use by the relevant allocate or reallocate function, but 224 that too would be a lot of changes. 225 226 <p> One scheme for out-of-memory would be to note that all GMP allocations go 227 through the <code>mp_set_memory_functions</code> routines. So if the 228 application has an intended <code>setjmp</code> recovery point it can 229 record memory activity by GMP and abandon space allocated and variables 230 initialized after that point. This might be as simple as directing the 231 allocation functions to a separate pool, but in general would have the 232 disadvantage of needing application-level bookkeeping on top of the 233 normal system <code>malloc</code>. An advantage however is that it needs 234 nothing from GMP itself and on that basis doesn't burden applications not 235 needing recovery. Note that there's probably some details to be worked 236 out here about reallocs of existing variables, and perhaps about copying 237 or swapping between "permanent" and "temporary" variables. 238 239 <p> Applications desiring a fine-grained error control, for instance a 240 language interpreter, would very possibly not be well served by a scheme 241 requiring <code>longjmp</code>. Wrapping every GMP function call with a 242 <code>setjmp</code> would be very inconvenient. 243 244 <p> Another option would be to let <code>mpz_t</code> etc hold a sort of NaN, 245 a special value indicating an out-of-memory or other failure. This would 246 be similar to NaNs in mpfr. Unfortunately such a scheme could only be 247 used by programs prepared to handle such special values, since for 248 instance a program waiting for some condition to be satisfied could 249 become an infinite loop if it wasn't also watching for NaNs. The work to 250 implement this would be significant too, lots of checking of inputs and 251 intermediate results. And if <code>mpn</code> routines were to 252 participate in this (which they would have to internally) a lot of new 253 return values would need to be added, since of course there's no 254 <code>mpz_t</code> etc structure for them to indicate failure in. 255 256 <p> Stack overflow is another possible exception, but perhaps not one that 257 can be easily detected in general. On i386 GNU/Linux for instance GCC 258 normally doesn't generate stack probes for an <code>alloca</code>, but 259 merely adjusts <code>%esp</code>. A big enough <code>alloca</code> can 260 miss the stack redzone and hit arbitrary data. GMP stack usage is 261 normally a function of operand size, which might be enough for some 262 applications to know they'll be safe. Otherwise a fixed maximum usage 263 can probably be obtained by building with 264 <code>--enable-alloca=malloc-reentrant</code> (or 265 <code>notreentrant</code>). Arranging the default to be 266 <code>alloca</code> only on blocks up to a certain size and 267 <code>malloc</code> thereafter might be a better approach and would have 268 the advantage of not having calculations limited by available stack. 269 270 <p> Actually recovering from stack overflow is of course another problem. It 271 might be possible to catch a <code>SIGSEGV</code> in the stack redzone 272 and do something in a <code>sigaltstack</code>, on systems which have 273 that, but recovery might otherwise not be possible. This is worth 274 bearing in mind because there's no point worrying about tight and careful 275 out-of-memory recovery if an out-of-stack is fatal. 276 277 <p> Operand overflow is another exception to be addressed. It's easy for 278 instance to ask <code>mpz_pow_ui</code> for a result bigger than an 279 <code>mpz_t</code> can possibly represent. Currently overflows in limb 280 or byte count calculations will go undetected. Often they'll still end 281 up asking the memory functions for blocks bigger than available memory, 282 but that's by no means certain and results are unpredictable in general. 283 It'd be desirable to tighten up such size calculations. Probably only 284 selected routines would need checks, if it's assumed say that no input 285 will be more than half of all memory and hence size additions like say 286 <code>mpz_mul</code> won't overflow. 287 288 289 <li> <strong>Performance Tool</strong> 290 291 <p> It'd be nice to have some sort of tool for getting an overview of 292 performance. Clearly a great many things could be done, but some primary 293 uses would be, 294 295 <ol> 296 <li> Checking speed variations between compilers. 297 <li> Checking relative performance between systems or CPUs. 298 </ol> 299 300 <p> A combination of measuring some fundamental routines and some 301 representative application routines might satisfy these. 302 303 <p> The tune/time.c routines would be the easiest way to get good accurate 304 measurements on lots of different systems. The high level 305 <code>speed_measure</code> may or may not suit, but the basic 306 <code>speed_starttime</code> and <code>speed_endtime</code> would cover 307 lots of portability and accuracy questions. 308 309 310 <li> <strong>Using <code>restrict</code></strong> 311 312 <p> There might be some value in judicious use of C99 style 313 <code>restrict</code> on various pointers, but this would need some 314 careful thought about what it implies for the various operand overlaps 315 permitted in GMP. 316 317 <p> Rumour has it some pre-C99 compilers had <code>restrict</code>, but 318 expressing tighter (or perhaps looser) requirements. Might be worth 319 investigating that before using <code>restrict</code> unconditionally. 320 321 <p> Loops are presumably where the greatest benefit would be had, by allowing 322 the compiler to advance reads ahead of writes, perhaps as part of loop 323 unrolling. However critical loops are generally coded in assembler, so 324 there might not be very much to gain. And on Cray systems the explicit 325 use of <code>_Pragma</code> gives an equivalent effect. 326 327 <p> One thing to note is that Microsoft C headers (on ia64 at least) contain 328 <code>__declspec(restrict)</code>, so a <code>#define</code> of 329 <code>restrict</code> should be avoided. It might be wisest to setup a 330 <code>gmp_restrict</code>. 331 332 333 <li> <strong>Prime Testing</strong> 334 335 <p> GMP is not really a number theory library and probably shouldn't have 336 large amounts of code dedicated to sophisticated prime testing 337 algorithms, but basic things well-implemented would suit. Tests offering 338 certainty are probably all too big or too slow (or both!) to justify 339 inclusion in the main library. Demo programs showing some possibilities 340 would be good though. 341 342 <p> The present "repetitions" argument to <code>mpz_probab_prime_p</code> is 343 rather specific to the Miller-Rabin tests of the current implementation. 344 Better would be some sort of parameter asking perhaps for a maximum 345 chance 1/2^x of a probable prime in fact being composite. If 346 applications follow the advice that the present reps gives 1/4^reps 347 chance then perhaps such a change is unnecessary, but an explicitly 348 described 1/2^x would allow for changes in the implementation or even for 349 new proofs about the theory. 350 351 <p> <code>mpz_probab_prime_p</code> always initializes a new 352 <code>gmp_randstate_t</code> for randomized tests, which unfortunately 353 means it's not really very random and in particular always runs the same 354 tests for a given input. Perhaps a new interface could accept an rstate 355 to use, so successive tests could increase confidence in the result. 356 357 <p> <code>mpn_mod_34lsub1</code> is an obvious and easy improvement to the 358 trial divisions. And since the various prime factors are constants, the 359 remainder can be tested with something like 360 <pre> 361 #define MP_LIMB_DIVISIBLE_7_P(n) \ 362 ((n) * MODLIMB_INVERSE_7 <= MP_LIMB_T_MAX/7) 363 </pre> 364 Which would help compilers that don't know how to optimize divisions by 365 constants, and is even an improvement on current gcc 3.2 code. This 366 technique works for any modulus, see Granlund and Montgomery "Division by 367 Invariant Integers" section 9. 368 369 <p> The trial divisions are done with primes generated and grouped at 370 runtime. This could instead be a table of data, with pre-calculated 371 inverses too. Storing deltas, ie. amounts to add, rather than actual 372 primes would save space. <code>udiv_qrnnd_preinv</code> style inverses 373 can be made to exist by adding dummy factors of 2 if necessary. Some 374 thought needs to be given as to how big such a table should be, based on 375 how much dividing would be profitable for what sort of size inputs. The 376 data could be shared by the perfect power testing. 377 378 <p> Jason Moxham points out that if a sqrt(-1) mod N exists then any factor 379 of N must be == 1 mod 4, saving half the work in trial dividing. (If 380 x^2==-1 mod N then for a prime factor p we have x^2==-1 mod p and so the 381 jacobi symbol (-1/p)=1. But also (-1/p)=(-1)^((p-1)/2), hence must have 382 p==1 mod 4.) But knowing whether sqrt(-1) mod N exists is not too easy. 383 A strong pseudoprime test can reveal one, so perhaps such a test could be 384 inserted part way though the dividing. 385 386 <p> Jon Grantham "Frobenius Pseudoprimes" (www.pseudoprime.com) describes a 387 quadratic pseudoprime test taking about 3x longer than a plain test, but 388 with only a 1/7710 chance of error (whereas 3 plain Miller-Rabin tests 389 would offer only (1/4)^3 == 1/64). Such a test needs completely random 390 parameters to satisfy the theory, though single-limb values would run 391 faster. It's probably best to do at least one plain Miller-Rabin before 392 any quadratic tests, since that can identify composites in less total 393 time. 394 395 <p> Some thought needs to be given to the structure of which tests (trial 396 division, Miller-Rabin, quadratic) and how many are done, based on what 397 sort of inputs we expect, with a view to minimizing average time. 398 399 <p> It might be a good idea to break out subroutines for the various tests, 400 so that an application can combine them in ways it prefers, if sensible 401 defaults in <code>mpz_probab_prime_p</code> don't suit. In particular 402 this would let applications skip tests it knew would be unprofitable, 403 like trial dividing when an input is already known to have no small 404 factors. 405 406 <p> For small inputs, combinations of theory and explicit search make it 407 relatively easy to offer certainty. For instance numbers up to 2^32 408 could be handled with a strong pseudoprime test and table lookup. But 409 it's rather doubtful whether a smallnum prime test belongs in a bignum 410 library. Perhaps if it had other internal uses. 411 412 <p> An <code>mpz_nthprime</code> might be cute, but is almost certainly 413 impractical for anything but small n. 414 415 416 <li> <strong>Intra-Library Calls</strong> 417 418 <p> On various systems, calls within libgmp still go through the PLT, TOC or 419 other mechanism, which makes the code bigger and slower than it needs to 420 be. 421 422 <p> The theory would be to have all GMP intra-library calls resolved directly 423 to the routines in the library. An application wouldn't be able to 424 replace a routine, the way it can normally, but there seems no good 425 reason to do that, in normal circumstances. 426 427 <p> The <code>visibility</code> attribute in recent gcc is good for this, 428 because it lets gcc omit unnecessary GOT pointer setups or whatever if it 429 finds all calls are local and there's no global data references. 430 Documented entrypoints would be <code>protected</code>, and purely 431 internal things not wanted by test programs or anything can be 432 <code>internal</code>. 433 434 <p> Unfortunately, on i386 it seems <code>protected</code> ends up causing 435 text segment relocations within libgmp.so, meaning the library code can't 436 be shared between processes, defeating the purpose of a shared library. 437 Perhaps this is just a gremlin in binutils (debian packaged 438 2.13.90.0.16-1). 439 440 <p> The linker can be told directly (with a link script, or options) to do 441 the same sort of thing. This doesn't change the code emitted by gcc of 442 course, but it does mean calls are resolved directly to their targets, 443 avoiding a PLT entry. 444 445 <p> Keeping symbols private to libgmp.so is probably a good thing in general 446 too, to stop anyone even attempting to access them. But some 447 undocumented things will need or want to be kept visible, for use by 448 mpfr, or the test and tune programs. Libtool has a standard option for 449 selecting public symbols (used now for libmp). 450 451 452 <li> <strong>Math functions for the mpf layer</strong> 453 454 <p> Implement the functions of math.h for the GMP mpf layer! Check the book 455 "Pi and the AGM" by Borwein and Borwein for ideas how to do this. These 456 functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, 457 cos, cosh, exp, log, log10, pow, sin, sinh, tan, tanh. 458 459 <p> Note that the <a href="http://mpfr.org">mpfr</a> functions already 460 provide these functions, and that we usually recommend new programs to use 461 mpfr instead of mpf. 462 </ul> 463 <hr> 464 465 </body> 466 </html> 467 468 <!-- 469 Local variables: 470 eval: (add-hook 'write-file-hooks 'time-stamp) 471 time-stamp-start: "This file current as of " 472 time-stamp-format: "%:d %3b %:y" 473 time-stamp-end: "\\." 474 time-stamp-line-limit: 50 475 End: 476 -->