Math and Hats: The average rank of elliptic curves

Tuesday, July 29, 2014

The average rank of elliptic curves

It's about time I should demonstrate the utility of the code I've written - the aim of the game for my GSoC project, after all, is to provide a new suite of tools with which to conduct mathematical research.

First some background. Given an elliptic curve $E$ specified by equation $y^2 = x^3 + Ax + B$ for integers $A$ and $B$, one of the questions we can ask is: what is the average rank of $E$ as $A$ and $B$ are allowed to vary? Because there are an infinite number of choices of $A$ and $B$, we need to formulate this question a bit more carefully. To this end, let us define the height of $E$ to be
$$ h(E) = \max\{|A|^3,|B|^2\} $$
[Aside: The height essentially measures the size of the coefficients of $E$ and is thus a fairly decent measure of the arithmetic complexity of the curve. We need the 3rd and 2nd powers in order to make the height function scale appropriately with the curve's discriminant.]

We can then ask: what is the limiting value of the average rank of all curves up to height $X$, as $X$ gets bigger and bigger? Is it infinity? Is it 0? Is it some non-trivial positive value? Does the limit even exist? It's possible that the average rank, as a function of $X$ could oscillate about and never stabilize.

There are strong heuristic arguments suggesting that the answer should be exactly $\frac{1}{2}$. Specifically, as the height bound $X$ gets very large, half of all curves should have rank 0, half should have rank 1, and a negligible proportion should have rank 2 or greater.

Even as recently as five years ago this there we knew nothing concrete unconditionally about average curve rank. There are some results by Brumer, Heath-Brown and Young providing successively better upper bounds on the average rank of curves ordered by height (2.3, 2 and $\frac{25}{14}$ respectively), but these results are contingent on the Riemann Hypothesis.

However, starting in 2011 Manjul Bhargava, together with Christopher Skinner and Arul Shankar, published a series of landmark papers (see here for a good expository slideshow, and here and here for two of the latest publications) proving unconditionally that average rank - that is, the limiting value of the average rank of all elliptic curves up to height $X$ - is bounded above by 0.885. A consequence of their work too is that at least 66% of all elliptic curves satisfy the rank part of the Birch and Swinnerton-Dyer Conjecture.

To a computationally-minded number theorist, the obvious question to ask is: Does the data support these results? I am by no means the first person to ask this question. Extensive databases of elliptic curves under various orderings have already been compiled, most notably those by Cremona (ordered by conductor) and Stein-Watkins (ordered essentially by discriminant). However, as yet no extensive tabulation of height-ordered elliptic curves has been carried out.

Here is a summarized table of elliptic curves with height at most 10000 - a total of 8638 curves, and the ranks thereof (all computations done in Sage, of course):

Rank	# Curves	%
0	2988	34.59%
1	4307	49.86%
2	1286	14.89%
3	57	0.66%
$\ge$4	0	0.00%
Total:	8638

Thus the average rank of elliptic curves is 0.816 when the height bound is 10000. This is worrisome: the average is significantly different from the value of 0.5 we're hoping to see.

The situation gets even worse when we go up to height bound 100000:

Rank	# Curves	%
0	19492	33.11%
1	28818	48.96%
2	9747	16.56%
3	801	1.36%
$\ge$4	4	0.01%
Total:	58862

This yields an average rank of 0.862 for height bound 100000. Bizarrely, the average rank is getting bigger, not smaller!

[Note: the fact that 0.862 is close to Bhargava's asymptotic bound of 0.885 is coincidental. Run the numbers for height 1 million, for example, and you get an average rank of 0.8854, which is bigger than the above asymptotic bound. Observationally, we see the average rank continue to increase as we push out to even larger height bounds beyond this.]

So what's the issue here? It turns out that a lot of the asymptotic statements we can make about elliptic curves are precisely that: asymptotic, and we don't yet have a good understanding of the associated rates of convergence. Elliptic curves, ornery beasts that they are, can seem quite different from their limiting behaviour when one only considers curves with small coefficients. We expect (hope?) that the average to eventually turn around and start to decrease back down to 0.5, but the exact point at which that happens is as yet unknown.

This is where I come in. One of the projects I've been working on (with Wei Ho, Jen Balakrishnan, Jamie Weigandt, Nathan Kaplan and William Stein) is to compute the average rank of elliptic curves for as large a height bound as possible, in the hope that we will get results a bit more reassuring than the above. The main steps of the project are thus:

Generate an ordered-by-height database of all elliptic curves up to some very large height bound (currently 100 million; about 18.5 million curves);
Use every trick in the book to compute the ranks of said elliptic curves;
Compute the average of said ranks.

Steps 1 and 3 are easy. Step 2 is not. Determining the rank of an elliptic curve is a notoriously hard problem - no unconditional algorithm with known complexity currently exists - especially when you want to do it for millions of curves in a reasonable amount of time. Sage, for example, already has a rank() method attached to their EllipticCurve class; if you pass the right parameters to it, the method will utilize an array of approaches to get a value out that is (assuming standard conjectures) the curve's rank. However, its runtime for curves of height near 100 million is on the order of 20 seconds; set it loose on 18.5 million such curves and you're looking at a total computation time of about 10 CPU years.

Enter GSoC project stage left. At the expense of assuming the Generalized Riemann Hypothesis and the Birch and Swinnerton-Dyer Conjecture, we can use the zero sum rank bounding code I've been working on to quickly compute concrete upper bounds on an elliptic curve's rank. This approach has a couple of major advantages to it:

It's fast. In the time it's taken me to write this post, I've computed rank bounds on 2.5 million curves.
Runtime is essentially constant for any curve in the database; we don't have to worry about how the method scales with height or conductor. If we want to go up to larger height bounds at a later date, no problem.

As always, some Terms and Conditions apply. The rank bounding code only gives you an upper bound on the rank: if, for example, you run the code on a curve and get the number 4 back, there's no way to determine with this method if the curve's rank is 4, or if it is really some non-negative integer less than 4.

Note: there is an invariant attached to an elliptic curve called the root number which is efficiently computable, even for curves with large conductor (it took less than 20 minutes to compute the root number for all 18.5 million curves in our database). The root number is one of two values: -1 or +1; if it's -1 the curve has odd analytic rank, and if it's +1 the curve has even analytic rank. Assuming BSD we can therefore always easily tell if the curve's rank is even or odd. My GSoC rank estimation code takes the root number into account, so in the example above, a returned value of 4 tells us that the curve's true rank is one of three values: 0, 2 or 4.

Even better, if the returned value is 0 or 1, we know this must be the actual algebraic rank of the curve: firstly, there's no ambiguity as to what the analytic rank is - it has to the returned 0 or 1; secondly, the BSD conjecture has been proven in the rank 0 & 1 cases. Thus even though we are a priori only computing analytic rank upper bounds, for some proportion of curves we've found the actual algebraic rank.

[Note that the rank bounding code I've written is predicated on knowing that all nontrivial zeros of an elliptic curve $L$-function lie on the critical line, so we still have to assume the Generalized Riemann Hypothesis.]

Thus running the rank bound code on the entire database of curves is a very sensible first step, and it's what I'm currently doing. It's computationally cheap to do - on SageMathCloud, using a Delta value of 1.0, the runtime for a single curve is about 4 milliseconds. Moreover, for some non-negligible percentage of curves the bounds will be observably sharp - based on some trial runs I'm expecting about 20-30% of the computed bounds to be 0 or 1.

That's about 4 million curves for which we won't have to resort to much more expensive rank finding methods. Huge savings!

11 comments:

CatacycleUmbertoJuly 30, 2014 at 11:35 AM
I have a number of comments more than the 4096 character limit allows!)

1) The choice of max(A^3,B^2) in the height bound is not canonical. Neither is max(4A^3,27B^2), but there are reasons to use the latter. In fact, I remember Frank Calegari saying that he specifically disliked Bhargava phrasing all his results in terms of bounds on average ranks (like 0.885), as the specific constants one obtains would be sensitive to the choice of 4 and 27. I am not sure I agree with Frank, but it is something to consider.

2) I can't think that 10 cpu-years is all *that* long given cluster sizes today. I guess it depends on how important you think this computation is (I heard a Bianchi modular form team with Cremona saying 900000 cpu-hours for their work, which is over 100 cpu-years). See also #6 below.

3) The paper of Stein (with Mazur, Bektemirov and Watkins) in the BAMS a few years back gave some additional evidence with the "obvious" idea of sampling curves at a higher height (Section 5.2.1). They say it is nontrivial to achieve a unbiased sample (with respect to real period?), but I am not thoroughly convinced of this (or that sure their methods are truly robust).
ReplyDelete
Replies
CatacycleUmbertoJuly 30, 2014 at 11:35 AM
4) Regarding "essentially constant" runtime: The parameters chosen in this rank bounding method should depend on height. More precisely, to get good results you should be taking L-series coefficients to some power of the conductor (fortunately much less than the sqrt, which would allow you just to compute the L-values). Maybe this power z is so small that you can consider N^z to be constant in your range. My guess (from some experiments) is that z=1/4 already works a good deal of the time, even z=1/8 is reasonable to try for larger N, which for curves of conductor up to 10^12 is 1000 L-series coeffs, which as you say is microseconds. But I insist that asymptotically at least, the number of coeffs used should scale with N by some formula.

5) I went through an analysis like yours about a month ago, though I was ordering by discriminant (cf. Stein-Watkins database), considering a height bound of 10^12, or about 10^10 curves. I ended up concluding that the "huge savings" was a constant of size maybe 3 or 5, the point is that you *do* have a lot of rank 2 curves (20%?), not to mention another (small) percentage with low-lying zeros. Unfortunately, I was unable to convince myself that this savings of 3 or 5 really made the experiment desirable, as the convergence to the asymptotic should be slow (logarithmic?). My conclusion was that the sampling (#3 above) was simply superior evidence. The main reason why to consider all curves up to a given height would be to dovetail with the Bhargava/Shankar bounds, but as #1 notes, this is finicky in the first place. OTOH, I *did* think of using analytic rank as you do (I got the idea from Bober's ANTS paper), and this should be just as applicable when sampling. Bottomline: whatever method you use to compute ranks, sampling should be part of your arsenal in statistical analysis.

6) Finally "However, its runtime for curves of height near 100 million is on the order of 20 seconds;" ... I don't know what Sage is doing, but this seems *way* too long. The Cremona mwrank method should be suitable for many curves (the discriminant is small enough), and for the rest the algebraic 2-descent method should be used, where the (PARI?) class group computation (assuming GRH) with e.g. Denis Simon's code should not be overwhelming. In reality, I might choose the second method, as the runtime is more nearly constant. I really can't imagine *any* curve less than conductor 10^10 taking more than a second (2-Selmer computations give an upper bound on the rank, which as with your methods filters out a large subset). I ran a simple loop in Magma with curves A of size 10^4 and B about 10^6, and every TwoSelmerGroup was less than 0.3s seconds, the average was about 0.1s. Obviously your analytic rank methods are still useful, but I think you should reconsider the timings for the other methods, if nothing else to get the comparison factor correct. My *guess* is that "rank" in Sage is doing the whole work of computing the Mordell-Weil group in every case(?), which is overkill. Maybe you investigate this, and figure out what is going on.
ReplyDelete
Replies
UnknownJuly 30, 2014 at 1:20 PM
Hi CatacycleUmberto

1) Agreed, the definition of height is not canonical; right at the beginning of this project we were umming and aahing as to whether to use the [4,27] coefficients or not, and in the end opted not to. Certainly whatever resulting averages we eventually get out will be affected by this; however, the idea is that their inclusion/exclusion shouldn't affect the asymptotics. The good news is that reordering curves by a recalibrated height is quick, so it won't be the bottleneck. Either way I doubt height 10^8 - however you choose to define it - is going to be large enough to see average rank stop increasing and start decreasing.

I know Bhargava's work is really about average n-Selmer group sizes; we are computing these as we go to, so hopefully we'll have some data to show there too.
ReplyDelete
Replies
UnknownJuly 30, 2014 at 1:26 PM
2) 10 CPU years isn't too long in the greater scheme of things, but if we can cut that down by an order of magnitude by using other methods, then we should certainly do so. This is a relatively small project and we don't as yet have much in the way of cluster access, so I'm trying to make computations as efficient as possible. In all likelihood we will have a certain proportion of curves at the end for which all the easy methods fail, and we'll have to resort to some beefier hardware to get those ranks.
ReplyDelete
Replies
CatacycleUmbertoJuly 30, 2014 at 1:29 PM
Two more comments: You mention the works of Brumer and Heath-Brown, but neglect to point out that Matt Young beat the 2-barrier, achieving 25/14. http://www.ams.org/journals/jams/2006-19-01/S0894-0347-05-00503-5/

In the last sentence of that paragraph, it is hard to parse what you are saying, but I don't think the current Bhargava/Shankar methods can hope to get much below 0.8 (or 0.85), let alone reach 1/2. I think your prior phrasing of "average rank of E for all curves up to height X, as X gets bigger and bigger" makes it sound (to the outsider) that B/S could improve their 0.885 by taking X larger, whereas that isn't really what is going on with any numerical improvements they might get. Rather, one way to improve 0.885 would be to understand the variation of root number in a larger class of curves that what they currently use.

And a comment on my comment #1: Maybe I confusing what Calegari said. He might have been more fretful about the "65% of curves satisfy BSD" statements. See his comment here. http://mattbakerblog.wordpress.com/2014/03/10/the-bsd-conjecture-is-true-for-most-elliptic-curves/comment-page-1/#comment-235
ReplyDelete
Replies
CatacycleUmbertoJuly 30, 2014 at 1:43 PM
Perhaps a comment in a different direction. If you want to compute a_p fast for many elliptic curves, one way is to do a giant precomputation.

For instance, for primes up to X, for each j-invariant mod p you precompute (up to sign) the associated a_p, and then just need to figure out which twist you have. This latter bit I think is a Kronecker computation, so much faster than a_p computation.

For X=10^6 the table should be about 75GB in size (thus fits in RAM on large machines). For an *experiment* this is probably better as it allows you compute to 10^6 maybe 100x faster, but for *usable Sage code* it is obviously not, sort of like your previous "computing primes" question.
ReplyDelete
Replies
UnknownJuly 30, 2014 at 1:48 PM
Thanks, I've mentioned Matt Young's results and added a link to his paper in my post, and reworded the definition of average rank and Bhargava's results thereon to make it a bit clearer.
ReplyDelete
Replies
UnknownJuly 30, 2014 at 1:49 PM
3) We haven't thought too hard about sampling; the aim of this project from the outset was to compute the ranks of *all* curves up to some large height bound. It might just be that random sampling of curves ordered by naive height is actually less prone to bias than, say, discriminant; in this case sampling would be a good idea. Will have to think about this some more.
ReplyDelete
Replies
UnknownJuly 30, 2014 at 2:03 PM
The parameter Delta determines the runtime of the routine. However, Delta directly specifies the tightness of the zero sum; since curves with large conductor have more zeros, we should scale our choice of Delta accordingly to get rank estimates of comparable fidelity.

Specifically, if $E$ has conductor $N$, then if I remember correctly the expected number of zeros with imaginary part at most 1 in magnitude is $\frac{\log(N)}{2\pi} + O(1)$. Accordingly Delta should be set to some constant times $\log(N)$. Since runtime is exponential in $\Delta$, this is tantamount to saying that in order to produce rank bounds of constant fidelity, runtime will scale with conductor to some power. In other words: yes, you're correct.

I haven't done this in my code yet. I should at least make it available as an option.
ReplyDelete
Replies
UnknownJuly 30, 2014 at 2:20 PM
5). See replies to 2) and 3)

6) I wasn't being intelligent at all when doing the Sage rank() timings - I just called the method with the specified parameters and checked how long it took. No doubt there are faster ways to get rank out using Sage. For example, I'm aware one could speed things when running 2-descent on lots of curves by combining all the homogenous space computations. Also since we're assuming GRH anyway, the class group computation part should be faster. And there's bound to be some duplication that we should eliminate by keeping track of the right things.

Thanks for all your comments. Who am I replying to, by the way?

- Simon
ReplyDelete
Replies

Add comment