/
index.xml
3564 lines (2774 loc) · 607 KB
/
index.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Tobias Raabe</title>
<link>https://tobiasraabe.github.io/</link>
<atom:link href="https://tobiasraabe.github.io/index.xml" rel="self" type="application/rss+xml" />
<description>Tobias Raabe</description>
<generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Thu, 26 May 2022 00:00:00 +0000</lastBuildDate>
<image>
<url>https://tobiasraabe.github.io/media/icon_hud96e8d92d6067df3d56ed86675d346c3_18348_512x512_fill_lanczos_center_3.png</url>
<title>Tobias Raabe</title>
<link>https://tobiasraabe.github.io/</link>
</image>
<item>
<title>pytask</title>
<link>https://tobiasraabe.github.io/slides/pytask/</link>
<pubDate>Thu, 26 May 2022 00:00:00 +0000</pubDate>
<guid>https://tobiasraabe.github.io/slides/pytask/</guid>
<description>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="" srcset="
/slides/pytask/logo_hu54cd4ce3cc6ec9b4ff06832acd0c1acc_11108_6713905054386d3777d250888f80694b.webp 400w,
/slides/pytask/logo_hu54cd4ce3cc6ec9b4ff06832acd0c1acc_11108_668fd313a364b8aba6054ca86bc985ac.webp 760w,
/slides/pytask/logo_hu54cd4ce3cc6ec9b4ff06832acd0c1acc_11108_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://tobiasraabe.github.io/slides/pytask/logo_hu54cd4ce3cc6ec9b4ff06832acd0c1acc_11108_6713905054386d3777d250888f80694b.webp"
width="650"
height="252"
loading="lazy" data-zoomable /></div>
</div></figure>
<p>A workflow management system for reproducible data analyses</p>
<hr>
<h2 id="why-do-you-need-pytask">Why do you need pytask?</h2>
<p>You&rsquo;ve probably organized your project using a similar folder structure where each
folder contains scripts carrying out specific tasks.</p>
<p>But, how do you execute all tasks or keep your project in sync?</p>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="" srcset="
/slides/pytask/structure_hud006443189d4e93bacc3dca5e41bafdf_93049_b1eff23266637135116c98e2cd9116db.webp 400w,
/slides/pytask/structure_hud006443189d4e93bacc3dca5e41bafdf_93049_c9650e332161839b5b515931ba8bb645.webp 760w,
/slides/pytask/structure_hud006443189d4e93bacc3dca5e41bafdf_93049_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://tobiasraabe.github.io/slides/pytask/structure_hud006443189d4e93bacc3dca5e41bafdf_93049_b1eff23266637135116c98e2cd9116db.webp"
width="50%"
height="50%"
loading="lazy" data-zoomable /></div>
</div></figure>
<hr>
<h2 id="how-does-pytask-help-you">How does pytask help you?</h2>
<ul>
<li>By defining dependencies and products of tasks, you implicitly define an execution
order.</li>
<li>pytask validates this definition</li>
<li>and executes only tasks which need to be updated.</li>
</ul>
<hr>
<h2 id="how-do-you-define-tasks-dependencies-and-products">How do you define tasks, dependencies, and products?</h2>
<p>Tasks are functions starting with <code>task_</code>. Use decorators to specify the dependencies
and products of a task. Using <code>depends_on</code> and <code>produces</code> as function args, you access
the paths to the files in the function body.</p>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="" srcset="
/slides/pytask/task_hua2bf14151676cf7927f6f127b73b7943_160385_c45300d44c89c981cd6cea53d701728f.webp 400w,
/slides/pytask/task_hua2bf14151676cf7927f6f127b73b7943_160385_d00940f96ed2e16fb9bfad85d106d918.webp 760w,
/slides/pytask/task_hua2bf14151676cf7927f6f127b73b7943_160385_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://tobiasraabe.github.io/slides/pytask/task_hua2bf14151676cf7927f6f127b73b7943_160385_c45300d44c89c981cd6cea53d701728f.webp"
width="70%"
height="70%"
loading="lazy" data-zoomable /></div>
</div></figure>
<hr>
<h2 id="execute-a-task">Execute a task</h2>
<p>Type <code>pytask</code> in your terminal, and it will automatically collect and execute all tasks.</p>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="" srcset="
/slides/pytask/execute_hufd282194fd395852446f3a6f0d6458e7_63164_6e9d542121037bdd8b75cf6e5ecdc5d2.webp 400w,
/slides/pytask/execute_hufd282194fd395852446f3a6f0d6458e7_63164_0ae5bee1a07736f41a6b1e73b787e655.webp 760w,
/slides/pytask/execute_hufd282194fd395852446f3a6f0d6458e7_63164_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://tobiasraabe.github.io/slides/pytask/execute_hufd282194fd395852446f3a6f0d6458e7_63164_6e9d542121037bdd8b75cf6e5ecdc5d2.webp"
width="70%"
height="70%"
loading="lazy" data-zoomable /></div>
</div></figure>
<hr>
<h2 id="what-are-the-benefits">What are the benefits?</h2>
<p>👉 Automation reduces errors and increases reproducibility.</p>
<p>👉 The build process is documented in code.</p>
<p>👉 You can iterate faster and be more productive.</p>
<hr>
<h2 id="research">Research</h2>
<p>Is pytask used for actual research? Yes!</p>
<p>Here is a Covid-19 forecast project with an agent-based model, 10+ datasets, many different policy scenarios and 1,000+ simulations.</p>
<p>👉 <a href="https://arxiv.org/abs/2106.11129" target="_blank" rel="noopener">https://arxiv.org/abs/2106.11129</a></p>
<hr>
<h2 id="teaching">Teaching</h2>
<p>pytask is also part of a graduate course, teaching economists programming and best practices for research projects.</p>
<p>👉 <a href="https://github.com/OpenSourceEconomics/econ-project-templates" target="_blank" rel="noopener">https://github.com/OpenSourceEconomics/econ-project-templates</a></p>
<hr>
<h2 id="what-else-has-pytask-to-offer">What else has pytask to offer?</h2>
<p>Scale your project by repeating tasks! 🚀</p>
<p>For example, create ten different datasets with randomly generated data.</p>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="" srcset="
/slides/pytask/repeat_huf3e0e76d8cc33474fa39ec2bee5ffebc_148080_dc3afaefc80d44e7d270a96621fc531a.webp 400w,
/slides/pytask/repeat_huf3e0e76d8cc33474fa39ec2bee5ffebc_148080_c154bcbcab4bf809e87caf157cf8dc6a.webp 760w,
/slides/pytask/repeat_huf3e0e76d8cc33474fa39ec2bee5ffebc_148080_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://tobiasraabe.github.io/slides/pytask/repeat_huf3e0e76d8cc33474fa39ec2bee5ffebc_148080_dc3afaefc80d44e7d270a96621fc531a.webp"
width="70%"
height="70%"
loading="lazy" data-zoomable /></div>
</div></figure>
<hr>
<h2 id="customize-pytask-with-plugins">Customize pytask with plugins!</h2>
<ul>
<li>Automatically parallelize the execution with <a href="https://github.com/pytask-dev/pytask-parallel" target="_blank" rel="noopener">https://github.com/pytask-dev/pytask-parallel</a> ⚡️</li>
<li>Support for executing R, Julia, and Stata scripts.</li>
<li>All plugins are here: <a href="https://pytask-dev.readthedocs.io/en/stable/plugin_list.html" target="_blank" rel="noopener">https://pytask-dev.readthedocs.io/en/stable/plugin_list.html</a></li>
</ul>
<hr>
<h2 id="templating">Templating</h2>
<p>Start a new project from a template!</p>
<p>A minimal template: <a href="https://github.com/pytask-dev/cookiecutter-pytask-project" target="_blank" rel="noopener">https://github.com/pytask-dev/cookiecutter-pytask-project</a></p>
<p>A template for reproducible economics projects: <a href="https://github.com/OpenSourceEconomics/econ-project-templates" target="_blank" rel="noopener">https://github.com/OpenSourceEconomics/econ-project-templates</a></p>
<hr>
<h2 id="debugging">Debugging</h2>
<p>Enter the debugger if one of your tasks fails, and you want to find out why! 🏗️</p>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="" srcset="
/slides/pytask/debugging_hu311174a0ea4f8d8d0d105c321b42522f_49944_902fdde7c463d2fe50ea34b000a83672.webp 400w,
/slides/pytask/debugging_hu311174a0ea4f8d8d0d105c321b42522f_49944_f6c550f5f722609531459e9d6d8657ee.webp 760w,
/slides/pytask/debugging_hu311174a0ea4f8d8d0d105c321b42522f_49944_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://tobiasraabe.github.io/slides/pytask/debugging_hu311174a0ea4f8d8d0d105c321b42522f_49944_902fdde7c463d2fe50ea34b000a83672.webp"
width="70%"
height="70%"
loading="lazy" data-zoomable /></div>
</div></figure>
<hr>
<h2 id="documentation">Documentation</h2>
<p>You can find out more about pytask in the documentation: <a href="https://pytask-dev.readthedocs.io/" target="_blank" rel="noopener">https://pytask-dev.readthedocs.io/</a>.</p>
<p>Follow the tutorials for a step-by-step introduction: <a href="https://pytask-dev.readthedocs.io/en/stable/tutorials/index.html" target="_blank" rel="noopener">https://pytask-dev.readthedocs.io/en/stable/tutorials/index.html</a></p>
<hr>
<h2 id="ecosystem">Ecosystem</h2>
<p>pytask is also part of a more extensive ecosystem of research tools developed at <a href="https://twitter.com/open_econ" target="_blank" rel="noopener">@open_econ</a>.</p>
<p>We will soon write about tools like <a href="https://github.com/OpenSourceEconomics/estimagic" target="_blank" rel="noopener">estimagic</a>, a package for complex numerical optimization, and estimation/calibration of scientific models.</p>
<hr>
<h2 id="acknowledgements">Acknowledgements</h2>
<p>Thanks for staying with me until the end! At last, some shout-outs to amazing people and projects.</p>
<p>Thanks to <a href="https://twitter.com/kroehrl" target="_blank" rel="noopener">@kroehrl</a>, <a href="https://twitter.com/JanosGabler" target="_blank" rel="noopener">@JanosGabler</a>, and <a href="https://twitter.com/econ_hmg" target="_blank" rel="noopener">@econ_hmg</a>, who helped me build this tool in endless and fruitful discussions! 🙇</p>
<hr>
<h2 id="acknowledgements-1">Acknowledgements</h2>
<p>pytask stands on the shoulders of these projects. Thank you!🙏</p>
<ul>
<li><a href="https://twitter.com/pytestdotorg" target="_blank" rel="noopener">@pytestdotorg</a> for pytest and pluggy.</li>
<li><a href="https://twitter.com/textualizeio" target="_blank" rel="noopener">@textualizeio</a> for cli interface created with
<a href="https://github.com/Textualize/rich" target="_blank" rel="noopener">rich</a>.</li>
<li><a href="https://twitter.com/_darrenburns" target="_blank" rel="noopener">@_darrenburns</a> for parametrizations burrowed from
<a href="https://github.com/darrenburns/ward" target="_blank" rel="noopener">ward</a>.</li>
</ul>
</description>
</item>
<item>
<title>gettsim</title>
<link>https://tobiasraabe.github.io/project/gettsim/</link>
<pubDate>Thu, 19 May 2022 00:00:00 +0000</pubDate>
<guid>https://tobiasraabe.github.io/project/gettsim/</guid>
<description><p>GETTSIM provides a depiction of the German Taxes and Transfers System that is usable in
a wide array of research applications, ranging from complex dynamic programming models
to detailed microsimulation studies.</p>
<p>I spent a short time on this project and created the DAG-based backend which allows
users to define functions to compute quantities of a tax and transfer system and the
backend will figure out the execution order itself.</p>
<p>This allows users to flexibly define their tax and transfer system or modify an existing
system by replacing some functions.</p>
<p>The DAG-based backend was finally extracted and transferred into its own package so that
everyone can use it and embed into their application. You can find it here:
<a href="https://github.com/OpenSourceEconomics/dags" target="_blank" rel="noopener">https://github.com/OpenSourceEconomics/dags</a>.</p>
</description>
</item>
<item>
<title>pytask</title>
<link>https://tobiasraabe.github.io/project/pytask/</link>
<pubDate>Thu, 19 May 2022 00:00:00 +0000</pubDate>
<guid>https://tobiasraabe.github.io/project/pytask/</guid>
<description></description>
</item>
<item>
<title>respy</title>
<link>https://tobiasraabe.github.io/project/respy/</link>
<pubDate>Thu, 19 May 2022 00:00:00 +0000</pubDate>
<guid>https://tobiasraabe.github.io/project/respy/</guid>
<description></description>
</item>
<item>
<title>sid</title>
<link>https://tobiasraabe.github.io/project/sid/</link>
<pubDate>Thu, 19 May 2022 00:00:00 +0000</pubDate>
<guid>https://tobiasraabe.github.io/project/sid/</guid>
<description></description>
</item>
<item>
<title>The Effectiveness of Testing, Vaccinations and Contact Restrictions for Containing the CoViD-19 Pandemic</title>
<link>https://tobiasraabe.github.io/publication/sid/</link>
<pubDate>Fri, 02 Jul 2021 00:00:00 +0000</pubDate>
<guid>https://tobiasraabe.github.io/publication/sid/</guid>
<description><!--
<div class="alert alert-note">
<div>
Click the <em>Cite</em> button above to demo the feature to enable visitors to import publication metadata into their reference management software.
</div>
</div>
<div class="alert alert-note">
<div>
Create your slides in Markdown - click the <em>Slides</em> button to check out the example.
</div>
</div>
Supplementary notes can be added here, including [code, math, and images](https://wowchemy.com/docs/writing-markdown-latex/).
-->
</description>
</item>
<item>
<title>How I write tests</title>
<link>https://tobiasraabe.github.io/post/how-i-write-tests/</link>
<pubDate>Wed, 31 Mar 2021 00:00:00 +0000</pubDate>
<guid>https://tobiasraabe.github.io/post/how-i-write-tests/</guid>
<description><p>Hi everybody,</p>
<p>I assume that all of you write tests for Python programs with
<a href="https://pytest.org/" target="_blank" rel="noopener">pytest</a>. If you do not use pytest or if you do not even write
tests, you should check out the following links which are useful and provide some
examples and an overview of pytest&rsquo;s capabilities.</p>
<ul>
<li><a href="https://realpython.com/pytest-python-testing/" target="_blank" rel="noopener">Effective Python Testing With Pytest - Real
Python</a></li>
<li><a href="https://raphael.codes/blog/customizing-your-pytest-test-suite-part-1/" target="_blank" rel="noopener">Customizing your pytest suite (Part 1) - Raphael
Pierzina</a></li>
<li><a href="https://raphael.codes/blog/customizing-your-pytest-test-suite-part-2/" target="_blank" rel="noopener">Customizing your pytest suite (Part 2) - Raphael
Pierzina</a></li>
</ul>
<p>Maybe you should also have heard about test driven development (TDD), but I have little
experience with it myself. If you have a great resource for beginners, send it my way
and I can include it here.</p>
<p>What I did not find in these guides is a combination of patterns I use fairly often to
write tests. Hopefully, it is useful for you as well. Let&rsquo;s go!</p>
<h2 id="the-function">The function</h2>
<p>First, here is the function we are going to test. The function takes any number of paths
and tries to find the longest parent path common to all paths.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">find_common_ancestor</span><span class="p">(</span><span class="o">*</span><span class="n">paths</span><span class="p">:</span> <span class="n">Union</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Path</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="n">Path</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="s2">&#34;&#34;&#34;Find a common ancestor of many paths.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl"> <span class="n">paths</span> <span class="o">=</span> <span class="p">[</span><span class="n">path</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">PurePath</span><span class="p">)</span> <span class="k">else</span> <span class="n">Path</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="k">for</span> <span class="n">path</span> <span class="ow">in</span> <span class="n">paths</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="n">path</span> <span class="ow">in</span> <span class="n">paths</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="ow">not</span> <span class="n">path</span><span class="o">.</span><span class="n">is_absolute</span><span class="p">():</span>
</span></span><span class="line"><span class="cl"> <span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="sa">f</span><span class="s2">&#34;Cannot find common ancestor for relative paths. </span><span class="si">{</span><span class="n">path</span><span class="si">}</span><span class="s2"> is relative.&#34;</span>
</span></span><span class="line"><span class="cl"> <span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="n">common_parents</span> <span class="o">=</span> <span class="nb">set</span><span class="o">.</span><span class="n">intersection</span><span class="p">(</span><span class="o">*</span><span class="p">[</span><span class="nb">set</span><span class="p">(</span><span class="n">path</span><span class="o">.</span><span class="n">parents</span><span class="p">)</span> <span class="k">for</span> <span class="n">path</span> <span class="ow">in</span> <span class="n">paths</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">common_parents</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">&#34;Paths have no common ancestor.&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">longest_parent</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">common_parents</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">parts</span><span class="p">))[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="n">longest_parent</span>
</span></span></code></pre></div><p>Here is an example:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">&gt;&gt;&gt; find_common_ancestor(&#34;C:\\Users\\Tobias&#34;, &#34;C:\\Users\\Tobi&#34;)
</span></span><span class="line"><span class="cl">WindowsPath(&#39;C:/Users&#39;)
</span></span></code></pre></div><p>The function returns errors if &hellip;</p>
<ul>
<li>one of the paths is relative.</li>
<li>the paths do not have a common ancestor.</li>
</ul>
<h2 id="the-test-function">The test function</h2>
<p>I will first show you the test function and, then, comment on some details.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">contextlib</span> <span class="kn">import</span> <span class="n">ExitStack</span> <span class="k">as</span> <span class="n">does_not_raise</span> <span class="c1"># noqa: N813</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">PurePosixPath</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">PureWindowsPath</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">pytest</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">unit</span>
</span></span><span class="line"><span class="cl"><span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">parametrize</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="s2">&#34;path_1, path_2, expectation, expected&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="n">pytest</span><span class="o">.</span><span class="n">param</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="n">PurePosixPath</span><span class="p">(</span><span class="s2">&#34;relative_1&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">PurePosixPath</span><span class="p">(</span><span class="s2">&#34;/home/relative_2&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">pytest</span><span class="o">.</span><span class="n">raises</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="ne">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="s2">&#34;Cannot find common ancestor for relative paths.&#34;</span>
</span></span><span class="line"><span class="cl"> <span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="kc">None</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nb">id</span><span class="o">=</span><span class="s2">&#34;test path 1 is relative&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">pytest</span><span class="o">.</span><span class="n">param</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="n">PureWindowsPath</span><span class="p">(</span><span class="s2">&#34;C:/home/relative_1&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">PureWindowsPath</span><span class="p">(</span><span class="s2">&#34;relative_2&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">pytest</span><span class="o">.</span><span class="n">raises</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="ne">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="s2">&#34;Cannot find common ancestor for relative paths.&#34;</span>
</span></span><span class="line"><span class="cl"> <span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="kc">None</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nb">id</span><span class="o">=</span><span class="s2">&#34;test path 2 is relative&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">pytest</span><span class="o">.</span><span class="n">param</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="n">PurePosixPath</span><span class="p">(</span><span class="s2">&#34;/home/user/folder_a&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">PurePosixPath</span><span class="p">(</span><span class="s2">&#34;/home/user/folder_b/sub_folder&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">does_not_raise</span><span class="p">(),</span>
</span></span><span class="line"><span class="cl"> <span class="n">PurePosixPath</span><span class="p">(</span><span class="s2">&#34;/home/user&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="nb">id</span><span class="o">=</span><span class="s2">&#34;normal behavior with UNIX paths&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">pytest</span><span class="o">.</span><span class="n">param</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="n">PureWindowsPath</span><span class="p">(</span><span class="s2">&#34;C:</span><span class="se">\\</span><span class="s2">home</span><span class="se">\\</span><span class="s2">user</span><span class="se">\\</span><span class="s2">folder_a&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">PureWindowsPath</span><span class="p">(</span><span class="s2">&#34;C:</span><span class="se">\\</span><span class="s2">home</span><span class="se">\\</span><span class="s2">user</span><span class="se">\\</span><span class="s2">folder_b</span><span class="se">\\</span><span class="s2">sub_folder&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">does_not_raise</span><span class="p">(),</span>
</span></span><span class="line"><span class="cl"> <span class="n">PureWindowsPath</span><span class="p">(</span><span class="s2">&#34;C:</span><span class="se">\\</span><span class="s2">home</span><span class="se">\\</span><span class="s2">user&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="nb">id</span><span class="o">=</span><span class="s2">&#34;normal behavior with Windows paths&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">pytest</span><span class="o">.</span><span class="n">param</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="n">PureWindowsPath</span><span class="p">(</span><span class="s2">&#34;C:</span><span class="se">\\</span><span class="s2">home</span><span class="se">\\</span><span class="s2">user</span><span class="se">\\</span><span class="s2">folder_a&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">PureWindowsPath</span><span class="p">(</span><span class="s2">&#34;D:</span><span class="se">\\</span><span class="s2">home</span><span class="se">\\</span><span class="s2">user</span><span class="se">\\</span><span class="s2">folder_b</span><span class="se">\\</span><span class="s2">sub_folder&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="n">pytest</span><span class="o">.</span><span class="n">raises</span><span class="p">(</span><span class="ne">ValueError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="s2">&#34;Paths have no common ancestor.&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="kc">None</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nb">id</span><span class="o">=</span><span class="s2">&#34;no common ancestor&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">),</span>
</span></span><span class="line"><span class="cl"> <span class="p">],</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">test_find_common_ancestor</span><span class="p">(</span><span class="n">path_1</span><span class="p">,</span> <span class="n">path_2</span><span class="p">,</span> <span class="n">expectation</span><span class="p">,</span> <span class="n">expected</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="k">with</span> <span class="n">expectation</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">result</span> <span class="o">=</span> <span class="n">find_common_ancestor</span><span class="p">(</span><span class="n">path_1</span><span class="p">,</span> <span class="n">path_2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">expected</span>
</span></span></code></pre></div><ul>
<li>
<p>Use <code>pytest.mark.parametrize</code> to minimize the test code and to make adding more
tests easier.</p>
</li>
<li>
<p>Use <code>pytest.param</code> to wrap each iteration. It allows to add the <code>id</code> parameter to
each iteration. Use the id to document the specific test case. With many test cases,
you will quickly forget the purpose of each single test.</p>
</li>
<li>
<p>The third argument of the parametrization, <code>expectation</code>, can be used to assert that
the tested function throws an exception. In case no exception is thrown, use
<code>does_not_raise()</code>.</p>
</li>
<li>
<p>If you expect an exception, you can pick an arbitrary object as the <code>expected</code>
output.</p>
</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>I hope you enjoyed this tutorial. Feel free to send me any feedback.</p>
<p><em>PS: When I started writing this guide, I discovered this
<a href="https://docs.python.org/3/library/os.path.html#os.path.commonpath" target="_blank" rel="noopener">function</a>. Maybe I
do not need my implementation plus the tests</em>.</p>
</description>
</item>
<item>
<title>What I have been doing lately</title>
<link>https://tobiasraabe.github.io/post/what-i-have-been-doing-lately/</link>
<pubDate>Wed, 30 Sep 2020 00:00:00 +0000</pubDate>
<guid>https://tobiasraabe.github.io/post/what-i-have-been-doing-lately/</guid>
<description><p>Hi everybody,</p>
<p>I have not written a post in a long time, so I am resuming this blog by keeping you
posted on some of my more recent projects.</p>
<p>I have been developing or contributing to several research applications last year. I
learned a lot about software engineering and designing applications. Maybe I will find
the time to make a post on some of the things I learned along the way.</p>
<h2 id="pytaskhahahugoshortcode-s0-hbhb"><a href="https://tobiasraabe.github.io/project/pytask/">pytask</a></h2>
<p>The project I am most excited about is <a href="https://github.com/pytask-dev/pytask" target="_blank" rel="noopener">pytask</a>, a
build system designed for researchers to run their project pipeline from data
preparation over analyses to compiling the reports.</p>
<p>I was highly frustrated with existing solutions and programmed my build system. The
interface is one of the highlights. It is similar to pytest to lower the entry barrier
but more beautiful because it uses <a href="https://textualize/rich" target="_blank" rel="noopener">rich</a>. pytask uses
<a href="https://github.com/pytest-dev/pluggy" target="_blank" rel="noopener">pluggy</a> under the hood to offer a plugin system.</p>
<p>If you already know my <a href="https://github.com/tobiasraabe/cookiecutter-research-template" target="_blank" rel="noopener">cookiecutter for reproducible
research</a>, you know Waf.
pytask replaces Waf. I will probably not update the cookiecutter for the foreseeable
future and, instead, I recommend <a href="https://github.com/hmgaudecker/econ-project-templates" target="_blank" rel="noopener">Hans-Martin&rsquo;s cookiecutter</a>, which will support pytask.</p>
<p>Please take a look at pytask and try it out in your next project. I appreciate any
feedback, comments, feature requests, and harsh criticism :).</p>
<p>I already held a presentation about pytask&rsquo;s design which I will probably post here in
some weeks. Half the time is about plugin architectures in general and pluggy, and the
other half is about pytask.</p>
<h2 id="respyhahahugoshortcode-s1-hbhb"><a href="https://tobiasraabe.github.io/project/respy/">respy</a></h2>
<p>Together with <a href="https://github.com/janosg" target="_blank" rel="noopener">Janos</a>, I created a framework for a certain
class of econometric models called
<a href="https://github.com/OpenSourceEconomics/respy" target="_blank" rel="noopener">respy</a>.</p>
<p>For the insiders, it is a framework for finite-horizon discrete choice dynamic
programming models, also called Eckstein-Keane-Wolpin models. Researchers use them to
study the human capital accumulation process in the labor market.</p>
<p>The documentation is quite extensive for such a young project—contributors taking over
the project plan to extend it with even more examples and applications.</p>
<p>If you are an economist interested in structural modeling, it might be an excellent
place to start. Even if you do not want to use this model, you might be able to get some
inspiration for your model.</p>
<p>I learned a lot about building interfaces people can use and how to write performant
code by choosing the right design and Numba where necessary.</p>
<h2 id="gettsimhahahugoshortcode-s2-hbhb"><a href="https://tobiasraabe.github.io/project/gettsim/">gettsim</a></h2>
<p>I redesigned the computational backend of
<a href="https://github.com/iza-institute-of-labor-economics/gettsim" target="_blank" rel="noopener">gettsim</a> with
<a href="https://github.com/janosg" target="_blank" rel="noopener">Janos</a> and <a href="https://github.com/hmgaudecker" target="_blank" rel="noopener">Hans-Martin</a>.
gettsim offers a representation of the German tax and transfer system and allows
researchers to study the impact of reforms on the amount of taxes and benefits people
face.</p>
<p>The task was to design an interface that allows users to modify or extend the
pre-implemented tax and transfer system.</p>
<p>Our solution is a mixture inspired by pytest&rsquo;s fixtures and a DAG (directed acyclic
graph).</p>
<figure id="figure-a-subset-of-the-german-tax-and-transfer-system">
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="A subset of the German tax and transfer system." srcset="
/post/what-i-have-been-doing-lately/gettsim_hu7a411bc6bbb514c1a204eaad6e6f2294_27638_1a832a87a02c02082e864f9e06ee961c.webp 400w,
/post/what-i-have-been-doing-lately/gettsim_hu7a411bc6bbb514c1a204eaad6e6f2294_27638_39fc8dbc1761862e1d7a094a243ac647.webp 760w,
/post/what-i-have-been-doing-lately/gettsim_hu7a411bc6bbb514c1a204eaad6e6f2294_27638_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://tobiasraabe.github.io/post/what-i-have-been-doing-lately/gettsim_hu7a411bc6bbb514c1a204eaad6e6f2294_27638_1a832a87a02c02082e864f9e06ee961c.webp"
width="601"
height="604"
loading="lazy" data-zoomable /></div>
</div><figcaption>
A subset of the German tax and transfer system.
</figcaption></figure>
<p>You can view the whole tax and transfer system as an extensive network. In this network,
nodes are quantities like child benefits, capital gains, or taxes on capital gains.
Edges represent how quantities relate to each other. For example, taxes paid on capital
gains is derived from capital gains subject to income tax. Quantities are part of the
data, or a function exists that computes it.</p>
<div class="alert alert-note">
<div>
The network is a directed graph because edges point in one direction. And it is acyclic
since there are no cycles in this graph, meaning you will never return to the same node
following the edges. These properties make the network a directed acyclic graph or a
DAG.
</div>
</div>
<p>This network view has a couple of benefits.</p>
<ul>
<li>
<p>A quantity can be computed once and then passed to the following nodes saving runtime
and reducing code duplication.</p>
</li>
<li>
<p>If you want to model a policy change, you can single out the relevant nodes in the
network and modify the underlying functions.</p>
</li>
<li>
<p>If you are interested only in a subset of tax and transfer system, subset the network
and remove unnecessary nodes.</p>
</li>
</ul>
<p>This flexibility is highly desirable, but what does the interface look like for a user.</p>
<p>Here, we use the idea of pytest&rsquo;s fixtures where using a fixture&rsquo;s name as an argument
in a test function gives you access to the return of the fixture inside the test
function. Similarly, a function in gettsim looks like this.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">child_benefits</span><span class="p">(</span><span class="n">n_children</span><span class="p">,</span> <span class="n">parameters</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="n">n_children</span> <span class="o">*</span> <span class="n">parameters</span><span class="p">[</span><span class="s2">&#34;child_benefits&#34;</span><span class="p">]</span>
</span></span></code></pre></div><p>Here, <code>n_children</code> is either a variable in the input data or a function with the same
name which computes the quantity.</p>
<p>We can build a DAG that allows us to determine an execution order for the functions from
a function&rsquo;s name and its argument names.</p>
<p>Users can modify the collection of functions by overwriting existing functions or adding
their own.</p>
<p>You can find out more about the package in the
<a href="https://gettsim.readthedocs.io/en/latest/" target="_blank" rel="noopener">documentation</a> or check out the code to
build a DAG in the standalone package
<a href="https://github.com/OpenSourceEconomics/dags" target="_blank" rel="noopener">dags</a>.</p>
<h2 id="sidhahahugoshortcode-s5-hbhb"><a href="https://tobiasraabe.github.io/project/sid/">sid</a></h2>
<p>Last but not least, I have been working on an epidemiological model to predict the
spread of infectious diseases. It is my COVID-19 project with
<a href="https://github.com/roecla" target="_blank" rel="noopener">Klara</a> and <a href="https://github.com/janosg" target="_blank" rel="noopener">Janos</a>. It is called
<a href="https://github.com/covid-19-impact-lab/sid" target="_blank" rel="noopener">sid</a>, and we hope to publish something
soon.</p>
</description>
</item>
<item>
<title>Matplotlib for Publications</title>
<link>https://tobiasraabe.github.io/post/matplotlib-for-publications/</link>
<pubDate>Thu, 15 Aug 2019 00:00:00 +0000</pubDate>
<guid>https://tobiasraabe.github.io/post/matplotlib-for-publications/</guid>
<description><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="o">%</span><span class="n">matplotlib</span> <span class="n">inline</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">matplotlib</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
</span></span></code></pre></div><h3 id="introduction">Introduction</h3>
<p>This post shows you how to make publication-ready plots with matplotlib.</p>
<h3 id="matplotlibrc-or-style-sheets">matplotlibrc or style sheets</h3>
<p>To align most stylistic choices, use a <a href="https://matplotlib.org/users/customizing.html" target="_blank" rel="noopener"><code>matplotlibrc</code></a> file or style sheets. The difference between the two is that <code>matplotlibrc</code> is picked up from some locations, first your working directory, and automatically applied to your plots.</p>
<p>If you want to have a more dynamic approach, use style sheets which look like the following:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl"># content of file style.mplstyle
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">axes.axisbelow : True # Draw axis grid lines and ticks below patches (True); above
</span></span><span class="line"><span class="cl"> # patches but below lines (&#39;line&#39;); or above all (False).
</span></span><span class="line"><span class="cl"> # Forces grid lines below figures.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">font.size : 12 # Font size in pt.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">grid.linewidth : 1.2 # In pt.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">legend.framealpha : 1 # Legend patch transparency.
</span></span><span class="line"><span class="cl">legend.scatterpoints : 3 # Number of scatter points in legend.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">lines.linewidth : 3 # line width in pt.
</span></span></code></pre></div><p>First, here is the plot with the default settings.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">1000</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</span></span></code></pre></div>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="" srcset="
/post/matplotlib-for-publications/matplotlib-for-publications_4_0_hub680afa9843cfeaa6f922846f203576d_14156_455077d3547fe93046300ebfed420590.webp 400w,
/post/matplotlib-for-publications/matplotlib-for-publications_4_0_hub680afa9843cfeaa6f922846f203576d_14156_686148d1412eaa274e73a047e339fa9c.webp 760w,
/post/matplotlib-for-publications/matplotlib-for-publications_4_0_hub680afa9843cfeaa6f922846f203576d_14156_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://tobiasraabe.github.io/post/matplotlib-for-publications/matplotlib-for-publications_4_0_hub680afa9843cfeaa6f922846f203576d_14156_455077d3547fe93046300ebfed420590.webp"
width="386"
height="248"
loading="lazy" data-zoomable /></div>
</div></figure>
<p>Now, apply the style and see the changes.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">style</span><span class="o">.</span><span class="n">use</span><span class="p">(</span><span class="s2">&#34;../style.mplstyle&#34;</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</span></span></code></pre></div>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="" srcset="
/post/matplotlib-for-publications/matplotlib-for-publications_7_0_hu347fff84ecbab4c76b1ceee77eee68c0_15716_bf42887be5d8624282c5649b6198b0de.webp 400w,
/post/matplotlib-for-publications/matplotlib-for-publications_7_0_hu347fff84ecbab4c76b1ceee77eee68c0_15716_38b906b832b331312d1b98a221e6d6cf.webp 760w,
/post/matplotlib-for-publications/matplotlib-for-publications_7_0_hu347fff84ecbab4c76b1ceee77eee68c0_15716_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://tobiasraabe.github.io/post/matplotlib-for-publications/matplotlib-for-publications_7_0_hu347fff84ecbab4c76b1ceee77eee68c0_15716_bf42887be5d8624282c5649b6198b0de.webp"
width="392"
height="251"
loading="lazy" data-zoomable /></div>
</div></figure>
<p>There are some other things you need to know about style sheets:</p>
<ul>
<li>You can use all the settings from <code>matplotlibrc</code>.</li>
<li>You can load multiple style sheets with <code>plt.style.use(&quot;first_style&quot;, &quot;second_style&quot;)</code> where overlapping options are overwritten by the following style.</li>
<li>I have made the observation that in notebooks you better load the style in a separate line to the imports because sometimes the changes are not picked up.</li>
</ul>
<h3 id="the-font">The font</h3>
<p>We would like the font in the plot to match the font in the text. Luckily, it is possible to use your <a href="https://matplotlib.org/3.1.0/tutorials/text/usetex.html" target="_blank" rel="noopener">LaTeX distribution</a> to compile the labels of the figures. In this case, my examination office required me to use Times New Roman which is available for matplotlib but not for pdflatex. pdflatex has only Times Roman in the package <code>newtxtext</code>, but who will recognize the difference :).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># These parameters can also be put into the style or matplotlibrc.</span>
</span></span><span class="line"><span class="cl"><span class="c1"># This is the dynamic approach of changing parameters.</span>
</span></span><span class="line"><span class="cl"><span class="n">nice_fonts</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">&#34;text.usetex&#34;</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">&#34;font.family&#34;</span><span class="p">:</span> <span class="s2">&#34;serif&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">&#34;font.serif&#34;</span> <span class="p">:</span> <span class="s2">&#34;Times New Roman&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="n">matplotlib</span><span class="o">.</span><span class="n">rcParams</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">nice_fonts</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</span></span></code></pre></div>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="" srcset="
/post/matplotlib-for-publications/matplotlib-for-publications_11_0_hu0cf6d0c541e71506d7188a2422ad4076_13723_b8cbac3a8674616fc1d16515bc937f71.webp 400w,
/post/matplotlib-for-publications/matplotlib-for-publications_11_0_hu0cf6d0c541e71506d7188a2422ad4076_13723_0ce838e5840092e50fca6dcf133c82fc.webp 760w,
/post/matplotlib-for-publications/matplotlib-for-publications_11_0_hu0cf6d0c541e71506d7188a2422ad4076_13723_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://tobiasraabe.github.io/post/matplotlib-for-publications/matplotlib-for-publications_11_0_hu0cf6d0c541e71506d7188a2422ad4076_13723_b8cbac3a8674616fc1d16515bc937f71.webp"
width="386"
height="249"
loading="lazy" data-zoomable /></div>
</div></figure>
<h3 id="the-plot-size">The plot size</h3>
<p>The plot size is extremely critical because it also sets the frame for all other options. E.g. if we want to embed an image within our thesis and the fontsize of the labels should match the thesis text, we cannot scale the figure with <code>[width=\textwidth]</code>.</p>
<p>Furthermore, we want the plot size to be visually appealing where the golden ratio is a good rule of thumb for the ratio between width and height. The golden ratio is approximately 1.618.</p>
<p>The following function is from <a href="https://jwalton.info/Embed-Publication-Matplotlib-Latex" target="_blank" rel="noopener">Jack Walton&rsquo;s blog</a> and returns the correct figure height and width in inches (used in matplotlib) for a given figure width in points (LaTeX&rsquo;s measurement).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">set_size</span><span class="p">(</span><span class="n">width</span><span class="p">,</span> <span class="n">fraction</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="s2">&#34;&#34;&#34; Set aesthetic figure dimensions to avoid scaling in latex.
</span></span></span><span class="line"><span class="cl"><span class="s2">
</span></span></span><span class="line"><span class="cl"><span class="s2"> Parameters
</span></span></span><span class="line"><span class="cl"><span class="s2"> ----------
</span></span></span><span class="line"><span class="cl"><span class="s2"> width: float
</span></span></span><span class="line"><span class="cl"><span class="s2"> Width in pts
</span></span></span><span class="line"><span class="cl"><span class="s2"> fraction: float
</span></span></span><span class="line"><span class="cl"><span class="s2"> Fraction of the width which you wish the figure to occupy
</span></span></span><span class="line"><span class="cl"><span class="s2">
</span></span></span><span class="line"><span class="cl"><span class="s2"> Returns
</span></span></span><span class="line"><span class="cl"><span class="s2"> -------
</span></span></span><span class="line"><span class="cl"><span class="s2"> fig_dim: tuple
</span></span></span><span class="line"><span class="cl"><span class="s2"> Dimensions of figure in inches
</span></span></span><span class="line"><span class="cl"><span class="s2"> &#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl"> <span class="c1"># Width of figure</span>
</span></span><span class="line"><span class="cl"> <span class="n">fig_width_pt</span> <span class="o">=</span> <span class="n">width</span> <span class="o">*</span> <span class="n">fraction</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="c1"># Convert from pt to inches</span>
</span></span><span class="line"><span class="cl"> <span class="n">inches_per_pt</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="mf">72.27</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="c1"># Golden ratio to set aesthetic figure height</span>
</span></span><span class="line"><span class="cl"> <span class="n">golden_ratio</span> <span class="o">=</span> <span class="p">(</span><span class="mi">5</span> <span class="o">**</span> <span class="mf">0.5</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="c1"># Figure width in inches</span>
</span></span><span class="line"><span class="cl"> <span class="n">fig_width_in</span> <span class="o">=</span> <span class="n">fig_width_pt</span> <span class="o">*</span> <span class="n">inches_per_pt</span>
</span></span><span class="line"><span class="cl"> <span class="c1"># Figure height in inches</span>
</span></span><span class="line"><span class="cl"> <span class="n">fig_height_in</span> <span class="o">=</span> <span class="n">fig_width_in</span> <span class="o">*</span> <span class="n">golden_ratio</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="n">fig_width_in</span><span class="p">,</span> <span class="n">fig_height_in</span>
</span></span></code></pre></div><p>At last, how do we know the <code>\textwidth</code> in points inside our document? Just insert <code>\the\textwidth</code> at the position of the figure in your text and it will be shown in your PDF document after compilation. If you are on a page in landscape mode, use <code>\linewidth</code> instead (thanks to <a href="https://tex.stackexchange.com/users/34505/john-kormylo" target="_blank" rel="noopener">John Kormylo</a> for <a href="https://tex.stackexchange.com/a/155627" target="_blank" rel="noopener">this</a>).</p>
<p>Assuming our thesis has text width of 400pt, the resulting figure looks like this.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="n">set_size</span><span class="p">(</span><span class="mi">400</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</span></span></code></pre></div>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="" srcset="
/post/matplotlib-for-publications/matplotlib-for-publications_15_0_hu401cbb9b57c523aa9ba182bcc5fcf663_11193_e9931f80cd7d8686ca59fda3633c6eaa.webp 400w,
/post/matplotlib-for-publications/matplotlib-for-publications_15_0_hu401cbb9b57c523aa9ba182bcc5fcf663_11193_7751ca955f888feee0714ddbb1d8125b.webp 760w,
/post/matplotlib-for-publications/matplotlib-for-publications_15_0_hu401cbb9b57c523aa9ba182bcc5fcf663_11193_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://tobiasraabe.github.io/post/matplotlib-for-publications/matplotlib-for-publications_15_0_hu401cbb9b57c523aa9ba182bcc5fcf663_11193_e9931f80cd7d8686ca59fda3633c6eaa.webp"
width="354"
height="217"
loading="lazy" data-zoomable /></div>
</div></figure>
<h3 id="vector-or-raster-graphics">Vector or raster graphics</h3>
<p>This is the last thing to know about graphics. <code>matplotlib</code> has many options to export graphics. You have probably used <code>png</code>, <code>jpeg</code> or <code>pdf</code> in the past. Just forget about the first two because they produce raster graphics instead of vector graphics. Vector graphics allow for almost infinite zooming whereas zooming into <code>png</code> and similar formats produces blurry and blocky pictures. <code>pdf</code> can be easily included with the general <code>graphicx</code> package of LaTeX.</p>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6b/Bitmap_VS_SVG.svg/300px-Bitmap_VS_SVG.svg.png" align="center"/>
<h3 id="conclusion">Conclusion</h3>
<p>That&rsquo;s it! This is everything you need for publication-ready plots with matplotlib :).</p>
<h3 id="references">References</h3>
<ul>
<li><a href="https://jwalton.info/Embed-Publication-Matplotlib-Latex" target="_blank" rel="noopener">https://jwalton.info/Embed-Publication-Matplotlib-Latex</a></li>
</ul>
</description>
</item>
<item>
<title>Numba - @vectorize and @guvectorize</title>
<link>https://tobiasraabe.github.io/post/numba-guvectorize/</link>
<pubDate>Sun, 14 Apr 2019 00:00:00 +0000</pubDate>
<guid>https://tobiasraabe.github.io/post/numba-guvectorize/</guid>
<description><p><em>Updated 16.08.2019. Better syntax, less errors and an example to show inlining of functions.</em></p>
<h2 id="why-should-i-use-them">Why should I use them</h2>
<p>Both decorators are extremely powerful as Numba compiles the instructions of the decorated function to machine code. That makes them significantly faster than the same operation written with Numpy. Of course, this is also true for functions decorated with <code>@jit</code>, but the other two decorators enable other features like reduction, accumulation and broadcasting (<a href="#Example-for-reduction,-accumulation-and-broadcasting">explanation</a>) which make operations on bigger matrices even faster.</p>
<h2 id="when-should-i-use-them">When should I use them</h2>
<p>&ldquo;[P]remature optimization is the root of all evil&rdquo; (Donald Knuth in <em>Computer Programming as an Art</em>, p. 671). Before you start to optimize your implementation use tools like <a href="https://github.com/rkern/line_profiler" target="_blank" rel="noopener">line_profiler</a> or <a href="https://github.com/jiffyclub/snakeviz" target="_blank" rel="noopener">snakeviz</a> to profile your code and find the real bottlenecks, not the ones you think about. Often it is sufficient to find better implementations from the standard library or rewrite parts such that they use Numpy or similar optimized frameworks. If you use Numpy, do not use loops but array operations. Use higher-dimensional arrays to gain performance by broadcasting. <a href="%7bfilename%7d/Blog/20190325-numpy-views-vs-copies.md">Avoid copies</a>. Also, refactoring and rewriting parts of the code may not give direct speed improvements, but your code becomes more compact and you can single out expensive operations.</p>
<p>Imagine you have done this, but still there are some functions which take up the majority of runtime. Next, decide whether you need a function that operates on single elements of an array or on arrays.</p>
<h2 id="using-the-vectorize-decorator">Using the <code>@vectorize</code> decorator</h2>
<p>The <code>@vectorize</code> is for writing efficient functions which work on every element of an array. One useful application for me was to compute the probability of <code>x</code> under a normal distribution with mean <code>mu</code> and standard deviation <code>sigma</code>. There already exists an implemenation in scipy, but it is incredibly slow. Let us start to write our own implementation.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">get_prob_norm_dist</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="mi">1</span> <span class="o">/</span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">pi</span><span class="p">)</span> <span class="o">*</span> <span class="n">sigma</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">mu</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">sigma</span> <span class="o">**</span> <span class="mi">2</span><span class="p">))</span>
</span></span></code></pre></div><p>We can compare our implementation with the scipy version to make sure it works as expected. Use the testing utilities from <code>numpy.testing</code> as the precision of floating point numbers will always differ to some extent. (Tip: Use <a href="https://docs.python.org/3.7/library/doctest.html" target="_blank" rel="noopener">doctest</a> to document and test your function at the same time. <a href="#Example-using-doctest">Here is an example</a>.)</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">scipy.stats</span> <span class="kn">import</span> <span class="n">norm</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_array_almost_equal</span><span class="p">(</span><span class="n">norm</span><span class="o">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="n">get_prob_norm_dist</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="n">decimal</span><span class="o">=</span><span class="mi">15</span><span class="p">)</span>
</span></span></code></pre></div><p>Now, create test data and measure the performance of our two methods.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">10000</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="o">%</span><span class="n">timeit</span> <span class="n">norm</span><span class="o">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
</span></span></code></pre></div><pre><code>647 µs ± 47.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
</code></pre>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="o">%</span><span class="n">timeit</span> <span class="n">get_prob_norm_dist</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
</span></span></code></pre></div><pre><code>78.4 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
</code></pre>
<p>Surprisingly, we are already faster by a factor of 8. At this point you should ask yourself again if further optimization is really necessary and whether your time is better spent anywhere else - remember Donald Knuth! In this example we will come down the rabbit hole a little bit more. The next step involves rewriting the function with the <code>@vectorize</code> decorator.</p>
<p>The major new thing is that the decorator requires a signature or even multiple signatures which define the input and output types of the function. The output type wraps the input types with round brackets. There are two ways to define the signature: declare the types of the arguments with types from Numba or declare the types with a string. I prefer the latter as it keeps your import statements short. With <code>nopython=True</code>, Numba does not use Python as a fallback if compilation fails, so we get notified. Also, the new function does not like default arguments.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">numba</span> <span class="kn">import</span> <span class="n">vectorize</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@vectorize</span><span class="p">([</span><span class="s2">&#34;float64(float64, float64, float64)&#34;</span><span class="p">],</span> <span class="n">nopython</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">get_prob_norm_dist_fast</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="mi">1</span> <span class="o">/</span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">pi</span><span class="p">)</span> <span class="o">*</span> <span class="n">sigma</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">mu</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">sigma</span> <span class="o">**</span> <span class="mi">2</span><span class="p">))</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="o">%</span><span class="n">timeit</span> <span class="n">get_prob_norm_dist_fast</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</span></span></code></pre></div><pre><code>248 µs ± 6.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
</code></pre>
<p>The new version actually requires three times the runtime of the Numpy version. But, while using Numba we need to code a little bit different. As appealing the one-liner may look, Numba works better if we store intermediate results in other variables and combine them later.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="nd">@vectorize</span><span class="p">([</span><span class="s2">&#34;float64(float64, float64, float64)&#34;</span><span class="p">],</span> <span class="n">nopython</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">target</span><span class="o">=</span><span class="s2">&#34;cpu&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">get_prob_norm_dist_fast</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-</span> <span class="n">mu</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">pi</span><span class="p">)</span> <span class="o">*</span> <span class="n">sigma</span>
</span></span><span class="line"><span class="cl"> <span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">y</span> <span class="o">**</span> <span class="mi">2</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">sigma</span> <span class="o">**</span> <span class="mi">2</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="n">probability</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="n">probability</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="o">%</span><span class="n">timeit</span> <span class="n">get_prob_norm_dist_fast</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</span></span></code></pre></div><pre><code>248 µs ± 3.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
</code></pre>
<p>We do not gain any performance, but this is just a showcase. If the operation does not involve many calculations and the input is not big, you cannot expect more performance than plain Numpy. If we increase the number of calculations and set the <code>target</code> keyword from <code>cpu</code> to <code>parallel</code>, we get similar performances.</p>
<p>The different target means that the compiled function does not use one core, but a recommended amount of cores on your machine.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="nd">@vectorize</span><span class="p">([</span><span class="s2">&#34;float64(float64, float64, float64)&#34;</span><span class="p">],</span> <span class="n">nopython</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">target</span><span class="o">=</span><span class="s2">&#34;parallel&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">get_prob_norm_dist_fast_parallel</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-</span> <span class="n">mu</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">pi</span><span class="p">)</span> <span class="o">*</span> <span class="n">sigma</span>
</span></span><span class="line"><span class="cl"> <span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">y</span> <span class="o">**</span> <span class="mi">2</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">sigma</span> <span class="o">**</span> <span class="mi">2</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="n">probability</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="n">probability</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">10000</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="o">%</span><span class="n">timeit</span> <span class="n">get_prob_norm_dist_fast_parallel</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</span></span></code></pre></div><pre><code>2.72 s ± 161 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
</code></pre>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="o">%</span><span class="n">timeit</span> <span class="n">get_prob_norm_dist</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
</span></span></code></pre></div><pre><code>2.32 s ± 66.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
</code></pre>
<p>Even if the resulting function is slower than the Numpy implementation, it can still make sense to write one. For example, you can only use Numba-compiled functions in other Numba-compiled functions. When this <a href="https://github.com/numba/numba/pull/2469" target="_blank" rel="noopener">PR for Numba</a> is finished, nesting functions is no obstacle anymore, but until then it is not possible all the time. Also, Numba <a href="https://numba.pydata.org/numba-doc/dev/user/faq.html#does-numba-inline-functions" target="_blank" rel="noopener">inlines functions</a> which means the function and its calls are compiled as one function. So, you do not have to worry whether calling another function reduces performance.</p>
<h2 id="using-the-guvectorize-decorator">Using the <code>@guvectorize</code> decorator</h2>
<p>I found the <code>@guvectorize</code> extremely useful and powerful, but at the same time hard to understand as I was not experienced in the more general concepts of array computation. The following example is a Monte Carlo integration which was my first point of contact with Numba.</p>
<p>Trying to understand what extrapolating from arrays to arrays of higher dimension means can be hard. Thus, we will start with the simplest example and then gradually expand the problem. Along the way, you will learn about the flexibility and to some extent elegance of <code>@guvectorize</code> :).</p>
<p>The problem is that we have an agent faced with four choices which yield constant utilities plus a stochastic component which is i.i.d. and normally distributed. We want to find the expected maximum utility (emax) from all choices which can be simulated by calculating maximum utility for a number of draws. The average yields the emax.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">static_utilities</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
</span></span></code></pre></div><h3 id="1-simulate-emax-for-one-agent-with-numpy">1. Simulate emax for one agent with Numpy</h3>
<p>We start by simulating the emax for one agent with Numpy to showcase the normal solution. We create 1000 draws add them to the deterministic component of the utility, take the max and then the average over all maxima.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">n_draws</span> <span class="o">=</span> <span class="mi">1000</span>
</span></span><span class="line"><span class="cl"><span class="n">draws</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">n_draws</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">simulate_emax_np</span><span class="p">(</span><span class="n">static_utilities</span><span class="p">,</span> <span class="n">draws</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">utilities</span> <span class="o">=</span> <span class="n">static_utilities</span> <span class="o">+</span> <span class="n">draws</span>
</span></span><span class="line"><span class="cl"> <span class="n">max_utilities</span> <span class="o">=</span> <span class="n">utilities</span><span class="o">.</span><span class="n">max</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="n">emax</span> <span class="o">=</span> <span class="n">max_utilities</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="n">emax</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">simulate_emax_np</span><span class="p">(</span><span class="n">static_utilities</span><span class="p">,</span> <span class="n">draws</span><span class="p">)</span>
</span></span></code></pre></div><pre><code>4.247215674679622
</code></pre>
<h3 id="2-simulate-emax-for-one-agent-with-numba">2. Simulate emax for one agent with Numba</h3>
<p>Now, we will do the same thing with Numba. I will give you the function first and then we will discuss its components.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">numba</span> <span class="kn">import</span> <span class="n">guvectorize</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@guvectorize</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="s2">&#34;f8[:], f8[:, :], f8[:]&#34;</span><span class="p">],</span> <span class="s2">&#34;(n_choices), (n_draws, n_choices) -&gt; ()&#34;</span><span class="p">,</span> <span class="n">nopython</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">target</span><span class="o">=</span><span class="s2">&#34;cpu&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">simulate_emax</span><span class="p">(</span><span class="n">static_utilities</span><span class="p">,</span> <span class="n">draws</span><span class="p">,</span> <span class="n">emax</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">n_draws</span><span class="p">,</span> <span class="n">n_choices</span> <span class="o">=</span> <span class="n">draws</span><span class="o">.</span><span class="n">shape</span>
</span></span><span class="line"><span class="cl"> <span class="n">emax_</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_draws</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">max_utility</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_choices</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">utility</span> <span class="o">=</span> <span class="n">static_utilities</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">+</span> <span class="n">draws</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">utility</span> <span class="o">&gt;</span> <span class="n">max_utility</span> <span class="ow">or</span> <span class="n">j</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">max_utility</span> <span class="o">=</span> <span class="n">utility</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="n">emax_</span> <span class="o">+=</span> <span class="n">max_utility</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="n">emax</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">emax_</span> <span class="o">/</span> <span class="n">n_draws</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">emax</span> <span class="o">=</span> <span class="n">simulate_emax</span><span class="p">(</span><span class="n">static_utilities</span><span class="p">,</span> <span class="n">draws</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">emax</span>
</span></span></code></pre></div><pre><code>4.247215674679617
</code></pre>
<p>First, we will take a look at the decorator and the two new components.</p>
<ol>
<li>The first argument is the signature, a list containing type declarations of the inputs. <code>f8</code> is just short-hand for <code>float64</code> and <code>f8[:]</code> represents an one-dimensional, <code>f8[:, :]</code> a two-dimensional array (<a href="https://numba.pydata.org/numba-doc/dev/reference/types.html" target="_blank" rel="noopener">types and signatures</a>). Note that the output type does not wrap the input types. Instead it is added to the end of the list. Furthermore, this function should return a scalar which has to be declared as an array in the signature. Later, we will only write to the first entry of the array.</li>
<li>The second argument is the layout which specifies the dimension of the inputs. You can use an arbitrary letter for a dimension, but assign the same letter to dimensions which should match. <code>()</code> represents a scalar and <code>(n_draws, n_choices)</code> an array where the first dimension is the number of draws and the second the number of choices. The return argument is separated from the rest with an arrow, <code>-&gt;</code>. Here the return is declared as a scalar.</li>
</ol>
<p>Now, we will examine the function. In the special case of gufuncs, the return value is added to the arguments of the function.</p>
<p>Instead of array operations, we are very explicit within the function and do everything with loops. For each set of draws, we add the deterministic and stochastic components and keep only the maximum which is stored in a temporary variable. In the end, the average over the sum of maximum utilities is saved to the initial slot of the output array to get a scalar output.</p>