Pseudoprimes

main page    database    statistics    verification   

Modular exponentiation

Modular exponentiation is the process of computing ae mod n. In this project it is a basic critical task, so we are highly concerned with its efficiency.
For notation purposes, we introduce the following shorthands:
typedef unsigned long long u64;
typedef signed long long i64;

Generic 64 bit code

These 64-bit implementations are made by Arnold Meijster and Harm Bakker.

C code


u64 mulmod64(u64 a, u64 b, u64 n) {
   /* Pre: 0 <= a, b < n */
   /* returns: (a * b) % n */
   u64 c, h;
   if (b < a) {
      /* swap a and b */
      h = a; a = b; b = h;
   }
   c = 0;
   while (a != 0) {
      if (a & 1) {
         if (b >= n - c) c = b - (n - c);
         else c += b;
      }
      a >>= 1;
      if (b >= n - b) b = b - (n - b);
      else b <<= 1;
   }
   return c;
}

u64 expmod64(u64 a, u64 e, u64 n) {
   /* Pre: 0 <= a, e < n */
   /* returns: (a ^ e) % n */
   u64 c = 1;
   while (e > 0) {
      if ((e & 1) > 0) c = mulmod64(a, c, n);
      e >>= 1;
      a = mulmod64(a, a, n);
   }
   return c;
}

Typical performance: 50,000 modular exponentations per second on an Opteron 2Ghz core.

Optimized specializations

On the Mersenne forums state of the art primality and factorization projects are discussed. This discussion pointed me to some highly optimized code from Geoffrey Reynolds, located here (in arithmetic.c from the newest stable package).

52-bit C code

Due to rounding errors, the following code may produce incorrect results when arguments have more than 52 bits.

double one_over_n;
u64 mulmod52(u64 a, u64 b, u64 n) {
   i64 tmp, ret;
   register double x, y;
   x = (i64)a;
   y = (i64)b;
   tmp = (i64)n * (i64)(x * y * one_over_n);
   ret = a * b - tmp;
   if (ret < 0) ret += n;
   else if (ret >= (i64)n) ret -= n;
   return ret;
}

The associated version of expmod52 is similar to expmod64, the only difference being that one_over_n must be set.
Typical performance: 600,000 mod. exp. per second on an Opteron 2Ghz core.

i386 assembler routines

Works only on 62 bit arguments. Typical performance: 1,500,000 mod. exp. per second on an Opteron 2Ghz core.
This code is currently the best we know of. It is used heavily in the category E search to find factors of 2r-1 by trial division.