github.com/aergoio/aergo@v1.3.1/libtool/src/gmp-6.1.2/mpn/pa32/README (about) 1 Copyright 1996, 1999, 2001, 2002, 2004 Free Software Foundation, Inc. 2 3 This file is part of the GNU MP Library. 4 5 The GNU MP Library is free software; you can redistribute it and/or modify 6 it under the terms of either: 7 8 * the GNU Lesser General Public License as published by the Free 9 Software Foundation; either version 3 of the License, or (at your 10 option) any later version. 11 12 or 13 14 * the GNU General Public License as published by the Free Software 15 Foundation; either version 2 of the License, or (at your option) any 16 later version. 17 18 or both in parallel, as here. 19 20 The GNU MP Library is distributed in the hope that it will be useful, but 21 WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 22 or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 23 for more details. 24 25 You should have received copies of the GNU General Public License and the 26 GNU Lesser General Public License along with the GNU MP Library. If not, 27 see https://www.gnu.org/licenses/. 28 29 30 31 32 33 34 This directory contains mpn functions for various HP PA-RISC chips. Code 35 that runs faster on the PA7100 and later implementations, is in the pa7100 36 directory. 37 38 RELEVANT OPTIMIZATION ISSUES 39 40 Load and Store timing 41 42 On the PA7000 no memory instructions can issue the two cycles after a store. 43 For the PA7100, this is reduced to one cycle. 44 45 The PA7100 has a lookup-free cache, so it helps to schedule loads and the 46 dependent instruction really far from each other. 47 48 STATUS 49 50 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the 51 instructions below (but some sw pipelining is needed to avoid the 52 xmpyu-fstds delay): 53 54 fldds s1_ptr 55 56 xmpyu 57 fstds N(%r30) 58 xmpyu 59 fstds N(%r30) 60 61 ldws N(%r30) 62 ldws N(%r30) 63 ldws N(%r30) 64 ldws N(%r30) 65 66 addc 67 stws res_ptr 68 addc 69 stws res_ptr 70 71 addib Loop 72 73 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb 74 (asymptotically) on the PA7100, using the instructions below. With proper 75 sw pipelining and the unrolling level below, the speed becomes 8 76 cycles/limb. 77 78 fldds s1_ptr 79 fldds s1_ptr 80 81 xmpyu 82 fstds N(%r30) 83 xmpyu 84 fstds N(%r30) 85 xmpyu 86 fstds N(%r30) 87 xmpyu 88 fstds N(%r30) 89 90 ldws N(%r30) 91 ldws N(%r30) 92 ldws N(%r30) 93 ldws N(%r30) 94 ldws N(%r30) 95 ldws N(%r30) 96 ldws N(%r30) 97 ldws N(%r30) 98 addc 99 addc 100 addc 101 addc 102 addc %r0,%r0,cy-limb 103 104 ldws res_ptr 105 ldws res_ptr 106 ldws res_ptr 107 ldws res_ptr 108 add 109 stws res_ptr 110 addc 111 stws res_ptr 112 addc 113 stws res_ptr 114 addc 115 stws res_ptr 116 117 addib 118 119 3. For the PA8000 we have to stick to using 32-bit limbs before compiler 120 support emerges. But we want to use 64-bit operations whenever possible, 121 in particular for loads and stores. It is possible to handle mpn_add_n 122 efficiently by rotating (when s1/s2 are aligned), masking+bit field 123 inserting when (they are not). The speed should double compared to the 124 code used today. 125 126 127 128 129 LABEL SYNTAX 130 131 The HP-UX assembler takes labels starting in column 0 with no colon, 132 133 L$loop ldws,mb -4(0,%r25),%r22 134 135 Gas on hppa GNU/Linux however requires a colon, 136 137 L$loop: ldws,mb -4(0,%r25),%r22 138 139 This is covered by using LDEF() from asm-defs.m4. An alternative would be 140 to use ".label" which is accepted by both, 141 142 .label L$loop 143 ldws,mb -4(0,%r25),%r22 144 145 but that's not as nice to look at, not if you're used to assembler code 146 having labels in column 0. 147 148 149 150 151 REFERENCES 152 153 Hewlett Packard, "HP Assembler Reference Manual", 9th edition, June 1998, 154 part number 92432-90012. 155 156 157 158 ---------------- 159 Local variables: 160 mode: text 161 fill-column: 76 162 End: