Asymptotics of linear algebra exam questions

I teach a first-year linear algebra course for computer science students here in Leiden. Unsurprisingly it is rather focussed on computations with matrices. Everything is done over the real numbers, which is in many ways I think not ideal – sure computer scientists should know about vector spaces over \mathbb F_2?! But realistically many of the students have to work hard to grasp vector spaces over the reals, so this might be going a step too far.

In any case, the result is that I want to give them many computational problems to practise on, and in weekly tests (oh, how they love me!), and also the final exam. In order to keep the computations reasonable to do by hand, we work with small matrices, and generally it is nice if the entries are integers, or at least have small height. More than that, it is also nice if the denominators stay small throughout the working, though of course for this one has to assume the student follows some standard algorithm.

At the most basic level, this means that if I am going to ask them to invert a matrix I really want it to be in GL_n \mathbb Z for some small n. Actually, I prefer to ask them whether it is invertible and if so to find the inverse, as it takes a tiny bit more care, but this is a topic for another post. This leads to the question:

Question: for a fixed value of n, and H, how many elements M of GL_n \mathbb Z are there such that all entries of M and of M^{-1} are bounded in absolute value by H?

I try not to repeat questions from year to year, so you can imagine I care a great deal about the answer… Perhaps I should refine the question, to worry about any horrible numbers that might arise in intermediate steps of a reasonable algorithm.

Well, I thought about this for a bit and I did not come up with any nice solution. Then I tried googling, and it does not seem to easy. I got stuck in a frustrating mess of paywalls and expired links (why can’t people use the arXiv?!?!), but I did find a result of Duke, Rudnick and Sarnak who use a different height (reciprocal of square root of smallest eigenvalue, I think), and give a beautiful asymptotic formula 2c_n t^{n^2/2} where

c_n = \frac{\pi^{n^2/2}}{\Gamma((n^2-n+2)/2)\Gamma(n/2)\zeta(2)\dots\zeta(n)}.

Phew, that was fun to type. I can see why I struggled to guess this! Unfortunately I can say nothing about the proof, since the paper was published in Duke, and I am not at work.

Of course there are lots of similar questions like this one can ask, and apparently they are not all easy! I think I will finish this post around now, but I will certainly keep this topic in mind. In case anyone is interested in automatically generating linear algebra questions, some very crudely-written Sage code is available on my homepage.

An update

Well, as you can see, that project did not last too long! I looked again over my posts this evening and was planning to go back to working on this a little. But then I started reading the beginning of IUTchI again, and I am wondering if this is a good idea. I just don’t have the time to do this seriously, and I fear that intermittent  efforts are not going to do the job. I am now planing to think a bit more about the direction of this blog – I have enjoyed writing the posts, but IUT is not necessarily the best choice of subject… I want a fun hobby at the moment, not another big job to do! Hopefully some new posts soon with a lighter theme.

 

More on initial theta data

Aim of this post is to look at some more of the terms in the initial theta data from last week. Before that, a quick comment on the `field of moduli’ from last time: it is tempting to think about the case F =  \mathbb{Q} in which case we would know that F is its own field of moduli, but sadly this does not contain a square root of -1, so is not allowed. No idea yet why this square root is important (or indeed what we will need the field of moduli for), I guess I must wait and see.

Anyway, the remaining terms are l, \underline{C}_K and \underline{\epsilon}. Well, l is easy; it is a prime such that the image of the mod l representation contains SL_2(\mathbb{F}_l), and which is coprime to various things (roughly, the l big enough).

What is \underline{C}_K? Well, it is a hyperbolic orbicurve of type (1, l-tors)±… What does this mean? The reference given is [EtTh, def 2.1], which takes some parsing. First, at the beginning of section 2 of [EtTh] we need the notion of a `smooth log curve of type (1, 1)’. I can’t quite find a definition of this, but in [Semi-graphs of Anabelioids] section 0 there seems to be enough to make a good guess: I think it is a smooth proper curve of genus 1 with 1 marked point (these are the (1,1)), and it is given the divisorial log structure from the marked point. Note that (I think) the curve is required to be smooth, not just log smooth. Let $Y^{log}$ be such a  curve. He then defines \overline{\Pi}^{ell}_Y as some strange quotient of the etale fundamental group of $Y^{log}$, then choose a surjection $f$ from \overline{\Pi}^{ell}_Y to Z/lZ with some conditions including that the map is trivial upon restriction to the decomposition group at the marked point y. This quotient map induces a cover of $Y^{log}$ called \underline{Y}^{log}, which is a curve of type (1, l-tors). To get from this to type (1, l-tors)± we take the stack-theoretic quotient by the obvious involution, so that’s what this \underline{C}_K in the `initial theta data’ is. Phew!

Hmm, would be good to understand this \underline{Y}^{log} a bit better.It is a Galois cover of Y^{log} with Galois group Z/lZ, but I guess it has some carefully controlled ramification (or at least, it is not just some random Galois cover), and I don’t really understand that. Anyway, at least we know what type of object it is, so maybe makes sense to move on and see what properties end up being important.

The last part of the `initial theta data’ is the `cusp’ \underline{\epsilon}. Hunting around in section 0 of [Semi-graphs of Anabelioids], Mochizui is usually thinking of a hyperbolic curve as a proper curve with some points removed, and these removed points are called `cusps’ (sorry for the awkward phrasing!). In this case \underline{\epsilon} is a cusp of the hyperbolic orbicurve \underline{C}_K that we saw just above. But which cusp? Apparently it is a cusp that arises from a nonzero element of the quotient Z/lZ in the definition of \underline{C}_K. Hmm, this is confusing – how does an element of the quotient group give a distinguished point? We have that \underline{Y}^{log} is a Galois cover of Y^{log} with Galois group Z/lZ and we declare each point of \underline{Y}^{log} lying over the unique cusp of Y^{log} to again be a cusp (this is a guess based on p239 of [EtTh]). But how does an element of Z/lZ pick out a distinguished one (then we want to map down to the quotient by the involution…)? I don’t get this right now:-(. But at least we understand now why kind of objects all the terms in the initial theta data are, so hopefully more will become clear in future…

Initial theta data

`A collection of initial theta data’ is defined in def 3.1 of [IUTchI]. It is a 7-tuple of data, the terms of which are mostly not too confusing (though some are still unclear). The data is

(\bar{F}/F, X_F, l, \underline{C}_K, \underline{\mathbb{V}}, \mathbb{V}_{mod}^{bad}, \underline{\epsilon})

It starts out easy: F is a number field containing a square root of -1, and \bar{F} is an algebraic closure. We will write G_F for the Galois group of \bar{F}/F.

X_F is a punctured genus 1 curve over F, such that the corresponding elliptic curve E_F has stable reduction at all non-archimedean primes of F. There is also some more notation attached to this X_F. First, we write X_F \rightarrow C_F for the stack-theoretic quotient by the unique involution (so C_F admits a proper birational non-representable map to \mathbb{A}^1, which is an isomorphism outside the images of the 2-torsion points). We also have the field F_{mod}, the field of moduli of E_F. Actually, Mochizuki says of X_F, but this seems weird. This confusion between X_F and E_F comes up quite a bit – I wonder if he is thinking of X_F as a proper log curve?

Anyway, I want to understand this field. I’ve never been quite sure what a `field of moduli’ is, I guess it is the smallest field such that the classifying map to the stack of elliptic curves factors via that field. Or is this the `field of definition’? I always get them mixed up. Mochizuki directs me to [AbsTopIII, definition 5.1 (ii)], where I find `the subfield of F determined by the [open] image of Aut(X_{\bar{F}} ) [i.e., the group of automorphisms of the scheme X_{\bar{F}} = X \times_F \bar{F} ] in Aut( \bar{F}) = Gal(\bar{F}/\mathbb{Q})‘. This seems quite weird. What is the map Aut(X_{\bar{F}}) \rightarrow Aut(\bar{F})? I think that given an automorphism of X_{\bar{F}} there is a unique automorphism of \bar{F} making the obvious diagram commute, which allows us to define a map. But I still have no feeling for this. A quick google threw up the PhD thesis of Bonnie Huggins which has a whole chapter on various definitions of fields of moduli and their equivalence. Taking advantage of the fact that we are in characteristic zero, I look at her theorem 1.7.1 to see that the field of moduli is the residue field at the corresponding point in the coarse moduli space, which feels much more intuitive (maybe I am weird!).

Then \mathbb{V}_{mod} is the set of places of F_{mod}, and \mathbb{V}_{mod}^{bad} is a (finite) non-empty subset of odd places at which E_F has bad reduction (the non-emptiness is an assumption on E_F; basically we can ignore small collections of elliptic curves in proving abc, in particular those of everywhere good reduction. This is mentioned in Ariyan’s notes).

In the same paragraph Mochizuki also defines some fundamental groups (with rather unintuitive notation, to me), and a few other odds and ends, which  guess we will need at some point. There are also a few terms left in the `initial theta data’ which we have not yet discussed, but it is late, so I will save these for next time.

Corollary 2.2 of [IUTchIV]

Last time we understood roughly how corollary 2.2 of [IUTchIV] implies corollary 2.3 and hence abc. But I would like now to understand the statement of 2.2 better.

Part (i) of 2.2 was basically OK. I was confused by the notation at first; more precisely by  \log(\frak{q}^\forall_{(-)}) and \log(\frak{q}^{\nmid 2}_{(-)}), because I did not understand Mochizuki’s use of the term `arbitrary nonarchimedean primes’. However, after more thought I think that he writes `arbitrary’ where I would write `all’ (this may be a useful thing to remember for later). I think that \log(\frak{q}^\forall_{(-)}) basically just means the arithmetic degree over \mathbb{Z} of the pullback along the tautological section of the boundary of the moduli stack of (stable) elliptic curves. Similarly,  \log(\frak{q}^{\nmid 2}_{(-)}) means the arithmetic degree of the corresponding thing over \mathbb{Z}[\frac{1}{2}]. Having understood this notation, the statement of part (i) seemed pretty obvious. I then read the proof, and it basically says `this is obvious from the definitions’. This reassured me that so far I understand the definitions!

Then I started on part (ii). It was all going fine until just before the displayed formula C1, where he starts talking about `a collection of initial Θ-data as in Theorem 1.10 that, in the notation of Theorem 1.10, satisfies…’. This made me think I needed to read theorem 1.10 before going further. Actually, looking back, maybe this is not necessary – perhaps can ignore C1, because C2 seems to be the bit that is important later.

Anyway, at this point I decided to look at theorem 1.10. This was a rather dispiriting moment, as he immediately starts referring back to a bunch of definitions from the earlier papers. The `initial theta data’ is probably something quite easy to pick up, but later on in the statement he starts making heavy use of things defined in [IUTchIII].

Then I remembered that Ariyan Javanpeykar gave a talk at the oxford IUT workshop on `how to reduce abc the theorem 1.10 of [IUTchIV]’. His notes were not on the website, but he kindly emailed them me, and they were really great! After reading them I have the impression that I roughly understand how to go from 1.10 to abc, and that filling in the details would take some work but does not really include anything crazy.

Because the aim of this is to get a broad outline of how the proof works I am not going to look further into how to go from 1.10 to abc for now. My current plan is to go to IUTchI and learn the definition of `initial theta data’. It may be time to try reading from the beginning a bit – having already seen some of these `reducing abc to special cases’ steps, for now I am happy to believe we can keep doing this. I am more interested at this point in seeing where the content is. So will try reading from the start for a bit, then probably get stuck and come back here, and hopefully eventually meet myself in the middle.

Theorem 2.1 of [GenEll]

Recall from last post that this theorem was how we deduced the main corollary 3.2 of [IUTchIV] from corollary 2.2 of [loc.cit.]. Goal of this post is to understand the statement and proof. The statement I found pretty clear. The proof was basically also fine, modulo three things:

  1. The claim in the first paragraph seems to be false as stated; for example, if we take X = \mathbb{P}^1, D = \{0,1,\infty\} and e=1 then Y is a finite étale cover of \mathbb{P}^1, AKA \mathbb{P}^1 itself, which is proper but certainly not hyperbolic. However, this is trivial to fix; we simply take e large enough. Looking later in the proof we see that this is not a problem.
  2. The existence of non-critical Belyi maps. I decided to take this as a black box, to avoid getting waylaid.
  3. The reduction is to corollary 2.2, but I have not read that yet (the statement is 2.5 pages long, so this is non-trivial!). I can see a formula of the shape we need in part i.C2 of corollary 2.2, but there are lots of conditions and notation to worry about. So for now I will take on faith that the conditions of 2.2 and 2.3 match up – reading 2.2 is the next task, and hopefully this will become clearer.

Note: when I say `the proof was basically fine’, I don’t mean I checked every detail, I just mean the ideas seemed clear and reasonable. But that is all I am aiming for at the moment.

Understanding statement of main theorem

All of Mochizuki’s papers can be downloaded from his website (or at least, all the relevant ones). This is awesome, I wish everyone did this (or even better, use the arXiv!). I will try to use the same abbreviations for papers as him; in particular, the `big 4′ are IUTchI – IV. This post is about some bits of IUTchIV.

The `big theorem’ is corollary 2.3; this is basically abc, uniform Volta etc. I am not going to go into the equivalence of all these statements. I found this quite easy to read, the only terms in the statement of 2.3 that I had to look up were the log different and log conductor. Annoyingly one has to go to [GenEll,def 1.5] to find these, but then they are quite easy. Note that the log different vanishes for \mathbb{Q}-points, and the log conductor vanishes for proper hyperbolic curves (i.e. when we take the boundary to be empty).

From here it is a fun and easy exercise to deduce `usual abc’ from corollary 2.3. If you want more context/applications/reasons to care, you could for example look at Frey’s article [Links between solutions of A-B=C and elliptic curves] (I have a pdf, so probably it is available online somewhere). But I am going to take this as given (for now).

The proof of 2.3 is rather brief; we apply [GenEll theorem 2.1] to reduce to a `compactly bounded’ version of usual abc, which is then immediate from corollary 2.2. So next task is to understand [GenEll theorem 2.1].

Introduction

I am writing this blog to document my attempts to understand Mochizuki’s proposed proof of the abc conjecture (interuniversal teichmuller theory, etc). I am not (at least at this point) trying to verify the correctness of the arguments, just trying to understand what is going on. If the proof turns out to be incorrect then perhaps this will have been a bit of a waste of time, but as years pass with no serious errors discovered this becomes less likely. On the other hand, if the proof is correct (which is surely possible) then it is a truly spectacular theorem, and I would love to understand the proof. Moreover, if these are techniques which can prove abc then probably they can do other things too…

This is certainly not a full-time project – I have many other things to work on, and there is a fair chance this will go nowhere. But on the other hand devoting a few evenings a week for a few months seems a reasonable price to pay for the small but non-zero chance that I can understand some amazing things!

With all this in mind, the aim is to get a broad overview (at least at first) and understand the main objects involved, and the general structure of the proof. I have read a number of overviews of the proofs (e.g. by Fesenko) which I have not found illuminating, and I have also looked at Mochizuki’s summaries, but these I also have not found very helpful. For example, all this talk about `dismantling scheme theory’ sounds cool, but I have no clue what it means. So my current plan is to ignore these expository articles, and ignore Mochizuki’s remarks and introductions, and just try to read the papers as I would any other long proof – first understand the statements, then look for the big definitions and lemmas, and iterate.