github.com/aergoio/aergo@v1.3.1/libtool/src/gmp-6.1.2/mpn/pa32/README (about)

     1  Copyright 1996, 1999, 2001, 2002, 2004 Free Software Foundation, Inc.
     2  
     3  This file is part of the GNU MP Library.
     4  
     5  The GNU MP Library is free software; you can redistribute it and/or modify
     6  it under the terms of either:
     7  
     8    * the GNU Lesser General Public License as published by the Free
     9      Software Foundation; either version 3 of the License, or (at your
    10      option) any later version.
    11  
    12  or
    13  
    14    * the GNU General Public License as published by the Free Software
    15      Foundation; either version 2 of the License, or (at your option) any
    16      later version.
    17  
    18  or both in parallel, as here.
    19  
    20  The GNU MP Library is distributed in the hope that it will be useful, but
    21  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
    22  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
    23  for more details.
    24  
    25  You should have received copies of the GNU General Public License and the
    26  GNU Lesser General Public License along with the GNU MP Library.  If not,
    27  see https://www.gnu.org/licenses/.
    28  
    29  
    30  
    31  
    32  
    33  
    34  This directory contains mpn functions for various HP PA-RISC chips.  Code
    35  that runs faster on the PA7100 and later implementations, is in the pa7100
    36  directory.
    37  
    38  RELEVANT OPTIMIZATION ISSUES
    39  
    40    Load and Store timing
    41  
    42  On the PA7000 no memory instructions can issue the two cycles after a store.
    43  For the PA7100, this is reduced to one cycle.
    44  
    45  The PA7100 has a lookup-free cache, so it helps to schedule loads and the
    46  dependent instruction really far from each other.
    47  
    48  STATUS
    49  
    50  1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
    51     instructions below (but some sw pipelining is needed to avoid the
    52     xmpyu-fstds delay):
    53  
    54  	fldds	s1_ptr
    55  
    56  	xmpyu
    57  	fstds	N(%r30)
    58  	xmpyu
    59  	fstds	N(%r30)
    60  
    61  	ldws	N(%r30)
    62  	ldws	N(%r30)
    63  	ldws	N(%r30)
    64  	ldws	N(%r30)
    65  
    66  	addc
    67  	stws	res_ptr
    68  	addc
    69  	stws	res_ptr
    70  
    71  	addib	Loop
    72  
    73  2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
    74     (asymptotically) on the PA7100, using the instructions below.  With proper
    75     sw pipelining and the unrolling level below, the speed becomes 8
    76     cycles/limb.
    77  
    78  	fldds	s1_ptr
    79  	fldds	s1_ptr
    80  
    81  	xmpyu
    82  	fstds	N(%r30)
    83  	xmpyu
    84  	fstds	N(%r30)
    85  	xmpyu
    86  	fstds	N(%r30)
    87  	xmpyu
    88  	fstds	N(%r30)
    89  
    90  	ldws	N(%r30)
    91  	ldws	N(%r30)
    92  	ldws	N(%r30)
    93  	ldws	N(%r30)
    94  	ldws	N(%r30)
    95  	ldws	N(%r30)
    96  	ldws	N(%r30)
    97  	ldws	N(%r30)
    98  	addc
    99  	addc
   100  	addc
   101  	addc
   102  	addc	%r0,%r0,cy-limb
   103  
   104  	ldws	res_ptr
   105  	ldws	res_ptr
   106  	ldws	res_ptr
   107  	ldws	res_ptr
   108  	add
   109  	stws	res_ptr
   110  	addc
   111  	stws	res_ptr
   112  	addc
   113  	stws	res_ptr
   114  	addc
   115  	stws	res_ptr
   116  
   117  	addib
   118  
   119  3. For the PA8000 we have to stick to using 32-bit limbs before compiler
   120     support emerges.  But we want to use 64-bit operations whenever possible,
   121     in particular for loads and stores.  It is possible to handle mpn_add_n
   122     efficiently by rotating (when s1/s2 are aligned), masking+bit field
   123     inserting when (they are not).  The speed should double compared to the
   124     code used today.
   125  
   126  
   127  
   128  
   129  LABEL SYNTAX
   130  
   131  The HP-UX assembler takes labels starting in column 0 with no colon,
   132  
   133  	L$loop  ldws,mb -4(0,%r25),%r22
   134  
   135  Gas on hppa GNU/Linux however requires a colon,
   136  
   137  	L$loop: ldws,mb -4(0,%r25),%r22
   138  
   139  This is covered by using LDEF() from asm-defs.m4.  An alternative would be
   140  to use ".label" which is accepted by both,
   141  
   142  		.label  L$loop
   143  		ldws,mb -4(0,%r25),%r22
   144  
   145  but that's not as nice to look at, not if you're used to assembler code
   146  having labels in column 0.
   147  
   148  
   149  
   150  
   151  REFERENCES
   152  
   153  Hewlett Packard, "HP Assembler Reference Manual", 9th edition, June 1998,
   154  part number 92432-90012.
   155  
   156  
   157  
   158  ----------------
   159  Local variables:
   160  mode: text
   161  fill-column: 76
   162  End: