This is often not a problem; Python is designed to be easy to read and maintain, and I'm hoping that the Python code I wrote is both of those. If we were just planning to run it on elliptic curves with small coefficients - for example, the curve represented by the equation y2=x3−16x+16 - this wouldn't be an issue. Curves with small coefficients have small conductors and thus few low-lying zeros near the central point, which allows us to run the zero sum code on them with small Delta parameter values. A small Delta value means the computation will finish very quickly regardless of how efficiently it's implemented, so it probably wouldn't be worth my while trying to optimize the code in that case.
To illustrate this point, here is the first most high-level, generic version of the method that computes the sum ∑γsinc2(Δγ) over the zeros of a given elliptic curve L-function (minus documentation):
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def _zerosum_sincsquared(self,Delta=1): | |
r""" | |
... | |
""" | |
npi = self._pi | |
twopi = 2*npi | |
eg = self._euler_gamma | |
t = Delta*twopi | |
expt = exp(t) | |
u = t*(-eg + log(RDF(self._N))/2 - log(twopi)) | |
w = RDF(npi**2/6-spence(1-RDF(1)/expt)) | |
y = RDF(0) | |
n = int(0) | |
while n < expt: | |
n += 1 | |
cn = self.cn(n) | |
if cn!=0: | |
logn = log(RDF(n)) | |
y += cn*(t-logn) | |
return 2*(u+w+y)/(t**2) |
However, the whole point of my GSoC project is to produce code that can be used for mathematical research; ultimately we want to push computations as far as we can and run the code on elliptic curves with large conductor, since curves with small conductor are already well-understood. Ergo, it's time I thought about going back over what I've written and seeing how I can speed it up.
There are two distinct ways to achieve speedups. The first is to rewrite the code more cleverly to eliminates unnecessary loops, coercions, function calls etc. Here is a second version I have written of the same function (still in Python):
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def _zerosum_sincsquared_fast(self,Delta=1): | |
""" | |
... | |
""" | |
npi = self._pi | |
twopi = 2*npi | |
eg = self._euler_gamma | |
t = RDF(Delta*twopi) | |
expt = exp(t) | |
u = t*(-eg + log(RDF(self._N))/2 - log(twopi)) | |
w = RDF(npi**2/RDF(6)-spence(RDF(1)-RDF(1)/expt)) | |
y = RDF(0) | |
bound1 = int(exp(t/2)) | |
bound2 = int(expt) | |
n = int(2) | |
while n <= bound1: | |
if pari(n).isprime(): | |
logp = log(RDF(n)) | |
ap = RDF(self._e.ellap(n)) | |
p = RDF(n) | |
z = (ap/p)*(t-logp) | |
# The p^n coefficients are calculated differently | |
# depending on whether p divides the level or not | |
if self._N%n==0: | |
aq = ap**2 | |
q = p**2 | |
logq = logp*2 | |
while logq < t: | |
z += (aq/q)*(t-logq) | |
logq += logp | |
q = q*p | |
aq = aq*ap | |
else: | |
alpha_p = CDF(ap,(4*p-ap**2).sqrt())/2 | |
alpha = alpha_p**2 | |
q = p**2 | |
logq = logp*2 | |
while logq < t: | |
aq = RDF(round(2*alpha.real())) | |
z += (aq/q)*(t-logq) | |
logq += logp | |
q = q*p | |
alpha = alpha*alpha_p | |
y -= z*logp | |
n += 1 | |
while n <= bound2: | |
if pari(n).isprime(): | |
logp = log(RDF(n)) | |
ap = RDF(self._e.ellap(n)) | |
y -= ap*logp/RDF(n)*(t-logp) | |
n += 1 | |
return 2*(u+w+y)/(t**2) |
Let's see how the methods match up in terms of speed. Below we run the two versions of the zero sum method on the elliptic curve E:y2=x3−16x+16, which is a rank 1 curve of conductor 37:
sage: import sage.lfunctions.zero_sums as zero_sums
sage: ZS = zero_sums.LFunctionZeroSum(EllipticCurve([-16,16]))
sage: ZS._zerosum_sincsquared(Delta=1)
1.01038406984
sage: ZS._zerosum_sincsquared_fast(Delta=1)
1.01038406984
sage: %timeit ZS._zerosum_sincsquared(Delta=1)
10 loops, best of 3: 20.5 ms per loop
sage: %timeit ZS._zerosum_sincsquared_fast(Delta=1)
100 loops, best of 3: 3.46 ms per loop
That's about a sixfold speedup we've achieved, just by reworking the section of the code that computes the cn sum.
The downside, of course, is that the code in method version 2 is more complicated - and thus less readable - than that in version 1. This is often the case in software development: you can write code that is elegant and easy to read but slow, or you can write code that is fast but horribly complicated and difficult to maintain. And when it comes to mathematical programming, unfortunately, sometimes the necessity for speed trumps readability.
The second major way to achieve speedups is to abandon pure Python and switch to a more low-level language. I could theoretically take my code and rewrite it in C, for example; if done relatively intelligently I'm sure the resulting code would blow what I have here out the water in terms of speed. However, I have no experience writing C code, and even if I did getting the code to interface with the rest of the Sage codebase would be a major headache.
Thankfully there is a happy middle ground: Cython. Cython is a programming language - technically a superset of Python - that allows direct interfacing with C and C++ data types and structures. Code written in Cython can be as fast as C code and nearly as readable as pure Python. Crucially, because it's so similar to Python it doesn't require rewriting all my code from scratch. And Sage already knows how to deal with Cython, so there aren't any compatibility issues there.
I am therefore in the process of doing exactly that: rewriting my code in Cython. Mostly this is just a cut-and-paste job and is pleasingly hassle-free; however, in order to achieve the desired speedups, the bottleneck methods - such as our sinc2 zero sum method above - must be modified to make use of C data types.
Here is the third, most recent version of the _zerosum_sincsquared() method for our zero sum class, this time written in Cython:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cpdef _zerosum_sincsquared_fast(self,Delta=1,bad_primes=None): | |
""" | |
... | |
""" | |
# If Delta>6.619, then we will most likely get overflow: | |
# some ap values will be too large to fit into a c int | |
if Delta > 6.619: | |
s = "Delta value too large; will result in overflow" | |
raise ValueError(s) | |
cdef double npi = self._pi | |
cdef double twopi = npi*2 | |
cdef double eg = self._euler_gamma | |
cdef double t,u,w,y,z,expt,bound1,logp,logq | |
cdef double thetap,thetaq,sqrtp,sqrtq,p,q | |
cdef int ap,aq | |
cdef long long n | |
cdef double N_double = self._N | |
t = twopi*Delta | |
expt = c_exp(t) | |
u = t*(-eg + c_log(N_double)/2 - c_log(twopi)) | |
w = npi**2/6-spence(-expt**(-1)+1) | |
y = 0 | |
# Do bad primes first. Add correct contributions and | |
# subtract incorrect contribution, since we'll add them | |
# back later on. | |
if bad_primes is None: | |
bad_primes = self._N.prime_divisors() | |
bad_primes = [prime for prime in bad_primes if prime<expt] | |
for prime in bad_primes: | |
n = prime | |
ap = self._e.ellap(n) | |
p = n | |
sqrtp = c_sqrt(p) | |
thetap = c_acos(ap/(2*sqrtp)) | |
logp = c_log(p) | |
q = 1 | |
sqrtq = 1 | |
aq = 1 | |
thetaq = 0 | |
logq = logp | |
z = 0 | |
while logq < t: | |
q *= p | |
sqrtq *= sqrtp | |
aq *= ap | |
thetaq += thetap | |
# Actual value of this term | |
z += (aq/q)*(t-logq) | |
# Incorrect value of this term to be removed below | |
z -= 2*c_cos(thetaq)*(t-logq)/sqrtq | |
logq += logp | |
y -= z*logp | |
# Good prime case. Bad primes are treated as good primes, | |
# but their contribution here is cancelled out above; this | |
# way we don't have to check if each prime divides the level | |
# or not. | |
n = 2 | |
bound1 = c_exp(t/2) | |
while n <= bound1: | |
if pari(n).isprime(): | |
ap = self._e.ellap(n) | |
p = n | |
sqrtp = c_sqrt(p) | |
thetap = c_acos(ap/(2*sqrtp)) | |
logp = c_log(p) | |
sqrtq = 1 | |
thetaq = 0 | |
logq = logp | |
z = 0 | |
while logq < t: | |
sqrtq *= sqrtp | |
thetaq += thetap | |
z += 2*c_cos(thetaq)*(t-logq)/sqrtq | |
logq += logp | |
y -= z*logp | |
n += 1 | |
while n <= expt: | |
if pari(n).isprime(): | |
ap = self._e.ellap(n) | |
p = n | |
logp = c_log(p) | |
y -= (t-logp)*(logp/p)*ap | |
n += 1 | |
return RDF(2*(u+w+y)/(t**2)) |
Let's see how this version holds up in a speed test. The Cython code has already been built into Sage and the class loaded into the global namespace, so I can just call it without having to attach or load any file:
sage: ZS = LFunctionZeroSum(EllipticCurve([-16,16]))
sage: ZS._zerosum_sincsquared_fast(Delta=1)
1.01038406984
sage: %timeit ZS._zerosum_sincsquared_fast(Delta=1)
1000 loops, best of 3: 1.72 ms per loop
The good news: the Cythonized version of the method produces the same output as the Python versions, and it's definitely faster. The bad news: the speedup is only about a factor of 2, which isn't hugely impressive given how much uglier the code is.
Why is this? Crucially, we still iterate over all integers up to the bound t, testing for primality at each step. This is very inefficient: most integers are not prime (in fact, asymptotically 0 percent of all positive integers are prime); we should be using sieving methods to eliminate primality testing at those integers which we know before checking are composite. For example, we should at the very least only ever iterate over the odd numbers beyond 3. That immediately halves the number of primality tests we have to do, and we should therefore get a comparable speedup if primality testing is what is dominating the runtime in this method.
This is therefore what I hope to implement next: rework the zero sum code yet again to incorporate prime sieving. This has applicability beyond just the sinc2 method: all explicit formula-type sums at some point invoke summing over primes or prime powers, so having access to code that can do this quickly would be a huge boon.
«we should be using sieving methods to eliminate primality testing at those integers which we know before checking are composite»
ReplyDeleteWhat about using the built-in prime_range function?
The built-in prime_range() function would work wonderfully for small Delta values (i.e. Delta < ~2.5), and would be a quick way to get impressive speedups there. However, the function would only be good for small Delta values, and would be *horrendously* bad for larger Delta values - which is what I'm really after.
ReplyDeleteThis is because prime_range() wraps PARI's primes() function, which (as it's called in Sage) runs a Sieve of Eratosthenes method to return a list of all primes up to the specified bound.
The Sieve of Eratosthenes works by creating a list in memory of integers up to the bound, and then successively "crossing out" all integers which are a multiple of some other integer; at the end, the only uncrossed-out numbers are by definition prime numbers.
This is much faster than testing each integer for primality, but the tradeoff is that one has to create in memory an array the size of the bound parameter. The issue for my project is that the bound we need to enumerate primes up to is exponential in the parameter Delta. For Delta=2, we need a sieve table about 300 kilobytes in size - no problem. However, use Delta=3 and we're up to 150 MB, and Delta=4 requires a table 82 GB in size. The latter is unworkable, which is why I can't just naively call the prime_range() function.
There are more intelligent ways to sieve, though. For example, we could break the range up into chunks of manageable size and only work with primes in a given interval at each step.This is perhaps a bit slower than a pure start-from-1 sieve, but way more memory efficient. PARI even has a method that does exactly this, I believe; however it's not wrapped in Sage, so I'll have to do a bit of extra work to get at its functionality.
I see, thank you for the detailed clarification. :)
Delete