Efforts to Improve at Forecasting

“Failure did not mean he had reached the limits of his ability. It meant he had to think hard and give it another go. Try, fail, analyze, adjust, try again.”

– Philip E. Tetlock, Superforecasting

I’m interested in improving my ability to forecast in a calibrated, efficient way. A friend of mine is also interested! Here’s the current plan to improve the accuracy of our judgements. This is not meant to be a permanent practice, but instead an intense exercise in rapidly improving through a project.

Practice, Practice, Practice

One useful categorization for learning new topics and skills: facts, concepts, and procedures. Or, ideas to deeply understand, ideas to learn, and things to perform.

It seems to me that becoming a better forecaster is basically an exercise in implementing a certain procedures more than anything else. That means the best way to get better is probably just doing lots and lots practice in a high-quality, short feedback loop.

To be sure, there are some useful concepts that are important (Fermi-izing, outside- & inside-views, extrapolation algorithms), but once one is aware of them it seems like the whole trick is using them appropriately. Facts are of course relevant for forecasting on any given question, but the process of collecting relevant facts for said question is a skill (procedure) to be trained. Certainly, stockpiling knowledge on all conceivable topics would be cumbersome.

Doing the Thing

I have the benefit of a partner for my forecasting efforts, which I expect to be immensely helpful (for social accountability, rapid feedback, information sharing).

Choosing Questions

The plan here is to create a spreadsheet of questions, one for each day of forecasting, selected for the coming week. We’ll each independently forecast the same question on the same day. The questions will be chosen for some combination of relevant factors, currently including methodological diversity (e.g. more statistical vs. more sui generis-seeming) and topical diversity (e.g. clean meat progress, outcomes of international diplomacy, economic forecasting, etc.).

Ideally, we’ll do six forecasts a week, one each day with one day for reviewing and giving feedback on the other’s forecasts.

There are three main forecasting platforms questions will be drawn from: Metaculus, the Center for Security and Emerging Technology’s ForeTell plaform, and the Good Judgement Open.

Making Forecasts

My plan is to write out my thinking for each forecast and post it here. I think my friend will maintain their writings in a Google Drive folder or something similar. The initial plan is to spend around 1.5-2 hours on each initial forecast, then update as we get more information.

For updates, my intention is to set Google alerts for terms relevant to each active forecast I have and update the posts as I get more information.

We’ll submit our forecasts on the relevant forecasting platforms which will track our scores.

Getting Feedback

When forecasts are resolved, the platforms will let us know how we did. Then, we can do post-mortems and see how the analysis held up and what was missed.

We’ll also set (at least) one day aside for reviewing each other’s forecasts and giving feedback. Ideally, I’ll also seek out feedback on my forecasts from others who are skilled.

What could go wrong?

I expect that the most likely failure scenarios would come from one or both of us becoming too busy or not prioritizing this effort. This could happen for a variety of reasons:

  • The goal of six forecasts a week may just be over-eager; in that case, we could reduce the expected time per forecast or reduce the number of forecasts.
    • Update 2021-05-27: Turns out that it was! We’ll bring this down to three forecasts a week for now.
  • One or both of us could go on vacation or take on a higher-priority project for some fixed amount of time, put the project on pause, and then never restart (my friend may start a new job soon, and acclimating to that could take up a lot of time!); in that case, we could set a reminder to check back in after some time to see if we’d like to restart.

Hopefully the upfront costs in collaboration infrastructure and planning we’ve done makes restarting fairly straightforward if we pause.

What would success look like?

I expect that we’ll both be pretty inaccurate in our forecasts, at least at first. I think smashing success would entail something like:

  1. Dramatic improvement (Brier scores cut by a third?) over our baseline accuracy
  2. Ending in the 95th percentile on the platforms we forecast on
  3. A comfortable grasp on the techniques available for making competent forecasts and when to use them

I think success likely also involves improving on a certain related competencies, like:

  1. Finding the highest-value information quickly; knowing the biases of a large range of sources one might consult and having a good sense of the strength of the evidence they present
  2. Carefully inspecting one’s own reasoning for strength of evidence; red-teaming oneself effectively; reasoning transparently to enable all of this
  3. Breaking difficult questions down into their component parts; creating useful mental models of the world one is trying to analyze.

I’m pretty excited about all of this! I think that the list of related competencies will only grow as we get more into it.

Thanks for reading; I’ll keep you posted.