Skip to content

Latest commit

 

History

History
415 lines (283 loc) · 29.8 KB

README_LONGER.md

File metadata and controls

415 lines (283 loc) · 29.8 KB

Remark Aug '22:I'm migrating this to docs.

StreamSkaters

One bridge between the /skaters and the microprediction leaderboards is provided by the StreamSkater class in the microprediction package, illustrated in the StreamSkater examples folder. This makes it trivial to use any skater from the TimeMachines package in a MicroCrawler (a live algorithm).

More about the Microprediction Python Client

See also README_EXAMPLES.md or README_LONGER.md

Class Hierarchy

Use MicroReader if you just need to get data and don't care to use a key..

MicroReader
   |
MicroWriter ----------------------------
   |                                   |
MicroPoll                         MicroCrawler
(feed creator)               (self-navigating algorithm)

You can pull most data directly, by the way, without a key.

Scheduled submissions versus "crawling"

The MicroWriter class can publish data or submit predictions. However if you intend to run a continuous process you might consider the MicroCrawler class or its derivatives.

Type Suggestion Example More examples
Scheduled submission MicroWriter Ambassy Fox submission_examples_transition
Running process MicroCrawler Malaxable Fox crawler_examples
Running process using timemachines StreamSkater Shole Gazelle crawler_skater_examples

A more complete picture would include SimpleCrawler, RegularCrawler, OnlineHorizonCrawler, OnlineStreamCrawler and ReportingCrawler.

Publishing absolute quantities versus changes

It is often better to publish changes in values than actual values of live quantities, to avoid race conditions or latency issues. There is a discussion in the README_LONGER.md.

Certainly it is easy to publish live quantities using only the MicroWriter as shown in traffic_live.py. However you might consider:

The former contains the blog, a knowledge center with video tutorials, details of competitions and prizemoney, and so forth. The latter is browser for humans looking to see how their algorithms are are performing, or whether their streams are updating.

Slack & Google Meets Tue 8pm/ Fri noon EST

Most people looking to contribute to this open initiative (and win beer money) join the microprediction slack. If that invite fails there might be one in the knowledge center that hasn't expired. There you will find Google Meet invite details for our regular informal chats.

Microprediction bookmarks

Data: stream list | stream explanations | csv Client: client | reader | writer | crawler | crawler examples | notebook examples Resources: popular timeseries packages | knowledge center | faq | linked-in | microprediction.org (dashboard) | microprediction.com (resources) | what | blog | contact | competitions | make-predictions | get-predictions | applications | collective epidemiology Video tutorials : 1: non-registration | 2: first crawler |3: retrieving historical data | 4: creating a data stream | 5: modifying your crawler's algorithm | 6: modifying crawler navigation Colab notebooks creating a new key | listing current prizes | submitting a prediction | choosing streams | retrieving historical data Related humpday | timemachines | timemachines-testing | microconventions | muid | causality graphs | embarrassingly | key maker | real data| chess ratings prediction Eye candy copula plots | causality plots | electricity case study

Probably best to start in the knowledge center and remember Dorothy, You're Not in Kaggle Anymore.

Cite

See CITE.md

FAQ:

FAQ

Video tutorials

See the Knowledge Center

Hey, where did the old README go?

README_LONGER.md

README (LONGER)

See the README.md first.

But first, another plea to just run the darn notebook already:

If you don't know about the live algorithm frenzy at microprediction.org then an extremely simple way to grok it is to open this notebook and run it. This will create an identity for you and enter your algorithm in an ongoing context to predict the next roll of a die. It is a silly little example, but I'm sure you can abstract and generalize from this.

Ultra-Quick start shell script

If you didn't take my advice above, or even if you did, here's another really fast way to get going (linux/osx). Cut and paste to a terminal:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/microprediction/microprediction/master/shell_examples/run_default_crawler_from_new_venv.sh)"

You should run that script "forever". It will print your write key and remind you to plug that into the dashboard to view your progress.

Weekly contributor Google meet

Noon Friday's EST. Contact us for details. We'll help you get started on the spot.

Examples, examples, examples

As noted, see the knowledge center for a structured set of Python tutorials which will show you how to create an identity, enter a live contest and use the dashboard to track your algorithms' progress. It will also show you how to retrieve historical data for time series research, if that is the only way you wish to use the site. You don't have to use Python because the api can be accessed in any language. We have contributors using Julia (example) and you can even enter using R from within Kaggle (tutorial). Here are some Python examples. Pro tip: Look at the leaderboards and click on CODE badges. Fork an algorithm that is doing well.

More discussion and help

Reach us on Linked-In where we are most active. You can discuss on github or
contact us directly. By all mean raise issues or even leave messages via Gitter if you wish.

Frequently asked questions

Class Hierarchy

Use MicroReader if you just need to get data and don't care to use a key. Create streams like this using the MicroWriter, or its sub-classes. You can also use MicroWriter to submit predictions, though MicroCrawler adds some conveniences.

MicroReader
   |
MicroWriter ----------------------------
   |                                   |
MicroPoll                         MicroCrawler
(feed creator)               (self-navigating algorithm)

A more complete picture would include SimpleCrawler, RegularCrawler, OnlineHorizonCrawler, OnlineStreamCrawler and ReportingCrawler, as well as additional conveniences for creating streams such as ChangePoll, MultiPoll, and MultiChangePoll.

Quickstart stream creation: publish a number every 20 minutes

If you have a function that returns a live number, you can do this

    from microprediction import MicroPoll
    feed = MicroPoll(difficulty=12,                 # This takes a long time ... see section on mining write_keys below
                     name='my_stream.json',         # Name your data stream
                     func=my_feed_func,             # Provide a callback function that returns a float 
                     interval=20)                   # Poll every twenty minutes
    feed.run()                                      # Start the scheduler

Retrieving distributional predictions

Once a stream is created and some crawlers have found it, you can view activity and predictions at www.microprediction.org,

Stream Roughly 1 min ahead Roughly 5 min ahead Roughly 15 min ahead Roughly 1 hr ahead
my_stream stream=my_stream&horizon=70 stream=my_stream&horizon=310 stream=my_stream&horizon=910 stream=my_stream&horizon=3555

Full URL example: https://www.microprediction.org/stream_dashboard.html?stream=c5_iota&horizon=70 for a 1 minute ahead CDF. If you wish to use the Python client:

         cdf = feed.get_cdf('cop.json',delay=70,values=[0,0.5])

where the delay parameter, in seconds, is the prediction horizon (it is called a delay as the predictions used to compute this CDF have all be quarantine for 70 seconds or more). The community of algorithms provides predictions roughly 1 min, 5 min, 15 minutes and 1 hr ahead of time. The get_cdf() above reveals the probability that your future value is less than 0.0, and the probability that it is less than 0.5. You can view CDFs and activity at MicroPrediction.Org by entering your write key in the dashboard.

Z-Scores

Now we're getting into the fancy stuff.

Based on algorithm predictions, every data point you publish creates another two streams, representing community z-scores for your data point based on predictions made at different times prior (those quarantined the shortest, and longest intervals).

Stream
Base stream https://www.microprediction.org/stream_dashboard.html?stream=c5_iota
Z-score relative to 70s ahead predictions https://www.microprediction.org/stream_dashboard.html?stream=z1~c5_iota~70
Z-score relative to 3555s ahead predictions https://www.microprediction.org/stream_dashboard.html?stream=z1~c5_iota~3555

In turn, each of these streams is predicted at four different horizons, as with the base stream. For example:

Stream Roughly 1 min ahead Roughly 5 min ahead Roughly 15 min ahead Roughly 1 hr ahead
c5_iota stream=c5_iota&horizon=70 stream=c5_iota&horizon=310 stream=c5_iota&horizon=910 stream=c5_iota&horizon=3555
z1~c5_iota~3555 stream=z1~c5_iota~3555&horizon=70 stream=z1~c5_iota~3555&horizon=310 stream=z1~c5_iota~3555&horizon=910 stream=z1~c5_iota~3555&horizon=3555

Poke around the stream listing near the bottom and you'll see them.

Crawling

See also the public api guide. If you have a function that takes a vector of lagged values of a time series and supplies a distributional prediction, a fast way to get going is deriving from MicroCrawler as follows:

    from microprediction import MicroCrawler, create_key
    from microprediction.samplers import differenced_bootstrap
    
    class MyCrawler(MicroCrawler):
    
        def sample(self, lagged_values, lagged_times=None, name=None, delay=None):
            my_point_estimate = 0.75*lagged_values[0]+0.25*lagged_values[1]                                     # You can do better
            scenarios = differenced_bootstrap(lagged=lagged_values,  decay=0.01, num=self.num_predictions)      # You can do better
            samples = [ my_point_estimate+s for s in scenarios ]
            return samples

    my_write_key = create_key(difficulty=11)   # Be patient. Maybe visit www.MUID.org to learn about Memorable Unique Identifiers 
    print(my_write_key)
    crawler = MyCrawler(write_key=write_key)
    crawler.run()

Enter your write_key into https://www.microprediction.org/dashboard.html to find out which time series your crawler is good at predicting. Check back in a day, a week or a month.

The crawler is also a reader and a writer, so a little about those next.

Reading

It is possible to retrieve most quantities at api.microprediction.org with direct web calls such as https://api.microprediction.org/live/c5_iota.json. Use your preferred means such as requests or aiohttp. For example using the former:

    import requests
    lagged_values = requests.get('https://api.microprediction.org/live/lagged_values::c5_iota.json').json()
    lagged        = requests.get('https://api.microprediction.org/lagged/c5_iota.json').json()

However the reader client adds a little convenience.

    from microprediction import MicroReader
    mr = MicroReader()
 
    current_value = mr.get('c5_iota.json')
    lagged_values = mr.get_lagged_values('c5_iota.json') 
    lagged_times  = mr.get_lagged_times('c5_iota.json')

Your best reference for the API is the client code https://github.com/microprediction/microprediction/blob/master/microprediction/reader.py

Writing

As noted above you may prefer to use MicroPoll or MicroCrawler rather than MicroWriter directly. But here are a few more details on the API wrapper those wanting more control. You can create predictions or feeds using only the writer. Your best reference is the client code https://github.com/microprediction/microprediction/blob/master/microprediction/writer.py

Instantiate a writer

In principle:

    from microprediction import MicroWriter
    mw = MicroWriter(difficulty=12)    # Creates new key on the fly, slowly! MUIDs explained at https://vimeo.com/397352413 

But better to do

      from microprediction import new_key
      write_key = new_key(difficulty=12)

separately, then pass in with

      mw = MicroWriter(write_key=write_key)

Thing is, new_key() will take many hours and that avoids the system being flooded with spurious streams. See https://config.microprediction.org/config.json for the current values of min_len, which is the official minimum difficulty to create a stream. If you don't need to create streams but only wish to predict, you can use a lower difficulty like 10 or even 9. But the easier your key, the more likely you are to go bankrupt (read on).

Submitting scenarios (manually)

If MicroCrawler does not float your boat, you can design your own way to monitor streams and make predictions using MicroWriter.

    scenarios = [ i*0.001 for i in range(mw.num_interp) ]   # You can do better ! 
    mw.submit(name='c5_iota.json',values=scenarios, delay=70)        # Specify stream name and also prediction horizon

See https://config.microprediction.org/config.json for a list of values that delay can take.

Creating a feed (manually)

If MicroPoll does not serve your needs you can create your stream one data point at a time:

    mw  = MicroWriter(write_key=write_key)
    res = mw.set(name='mystream.json',value=3.14157) 

However if you don't do this regularly, your stream's history will die and you will lose rights to the name 'mystream.json' established when you made the first call. If you have a long break between data points, such as overnight or over the weekend, consider touching the data stream:

    res = mw.touch(name='mystream.json')

to let the system know you still care.

Troubleshooting stream creation

  1. Upgrade the library, which is pretty fluid

    1. pip install --upgrade microprediction
  2. Check stream_conventions to see if you are violating a stream naming convention

    1. Must end in .json
    2. Must contain only alphanumeric, hyphens, underscores, colons (discouraged) and at most one period.
    3. Must not contain double colon.
  3. Log into Dashboard with your write_key:

    1. https://www.microprediction.org/dashboard.html
    2. Check for errors/warnings You can also use mw.get_errors(), mw.get_warnings(), mw.get_confirmations()
    3. Was the name already taken?
    4. Is your write_key bankrupt?

Write key mining script

Want more write keys? Cut and paste this bash command into a bash shell:

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/microprediction/muid/master/examples/mine_from_venv.sh)"

or use the MUID library (www.muid.org) ...

    $pip install muid
    $python3
    >>> import muid
    >>> muid.mine(skip_intro=True)

See www.muid.org or https://vimeo.com/397352413 for more on MUIDs. Use a URL like http://www.muid.org/validate/fb74baf628d43892020d803614f91f29 to reveal the hidden "spirit animal" in a MUID. The difficulty is the length of the animal, not including the space.

Balances and bankruptcy

See bankruptcy

Advanced topic: Higher dimensional prediction with cset()

Multivariate prediction solicitation is available to those with write_keys of difficulty 1 more than the stream minimum (i.e. 12+1). If you want to use this we suggest you start mining now. My making regular calls to mw.cset() you can get all these goodies automatically:

Functionality Example dashboard URL
Base stream #1 https://www.microprediction.org/stream_dashboard.html?stream=c5_iota
Base stream #2 https://www.microprediction.org/stream_dashboard.html?stream=c5_bitcoin
Z-scores https://www.microprediction.org/stream_dashboard.html?stream=z1~c5_iota~310
Bivariate copula https://www.microprediction.org/stream_dashboard.html?stream=z2~c5_iota~pe~910
Trivariate copula https://www.microprediction.org/stream_dashboard.html?stream=z3~c5_iota~c5_bitcoin~pe~910

Copula time series are univariate. An embedding from R^3 or R^2 to R is used (Morton space filling Z-curve). The most up to date reference for these embeddings is the code (see zcurve_conventions ). There is a little video of the embedding in the FAQ.

Miscellaneous collateral / blog articles

As noted, this project is socialized mostly via linked-in and the knowledge center is a good place to start.

Some of the blog articles might help introduce microprediction:

Presentations at Rutgers, MIT and elsewhere can be found in the presentations repo. There are also links to video presentations in some of the blog articles.

There are also some articles that pre-date the blog. Online Distributional Estimation | Badminton | Helicopulas.

The longer "why" stuff, if you have the time and inclination. There's a first glimpse, some categories of business application, some remarks on why microprediction is synomymous with AI due to the possibility of value function prediction, and a straightforward plausibility argument for why an open source, openly networked collection of algorithms that are perfectly capable of managing each other will sooner or later eclipse all other modes of production of prediction.