/
cobey_lab_handbook.Rmd
1813 lines (1195 loc) · 117 KB
/
cobey_lab_handbook.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Cobey Lab Handbook"
#author: "Sarah Cobey"
date: "`r Sys.Date()`"
site: bookdown::bookdown_site
output: bookdown::gitbook
documentclass: book
bibliography: [book.bib, packages.bib]
biblio-style: apalike
link-citations: yes
github-repo: cobeylab/lab_handbook
description: "Handbook of the Cobey Lab at the University of Chicago"
---
# Preface {-}
This handbook is partly for prospective or new lab members who want to know how we do things. The first chapters, where we lay out our principles and general style of work, are most relevant here.
The main purpose of this handbook, however, is to help current lab members share knowledge, develop skills, and get things done. Contributions welcome.
***Contributors:**
By Sarah Cobey. Midway section is originally by Ed Baskerville with updates made by Rachel Oidtman. Coding section is by Alex Byrnes.*
<!--chapter:end:index.Rmd-->
# Why we work {#why}
As a lab, we have two equally important goals:
1. to perform high-quality research and
2. to create an environment that accelerates the growth of scientists and improves the practice of science.
Performing high-quality research means that we undertake meaningful problems, investigate them in a rigorous way, and promptly publish our results so that others can easily pick up where we leave off.
Creating an environment that promotes the growth of scientists means that we support the expansion of our own and one another's research capacities, especially through persistent effort, curiosity, and constructive criticism.
These practices in themselves should improve the way science is done, but we also have broader responsibilities to promote equity, limit unnecessary struggle and wasted resources, and perform research with the public good in mind.
This handbook outlines practices to help us achieve these goals. We'll always be improving our methods! If you work here, you can help.
<!--chapter:end:01-why.Rmd-->
# How we work
## Principles
**Do ambitious research.**
Research always seems to take more time than it should, so spend your time on important questions.
Think hard about what *should* be done, not only what can be done.
Try not to let others define subfields and questions for you.
Be deeply practical in evaluating your progress and choosing your next steps, but work toward lofty aims.
**Learn fast and change direction when necessary.**
Research involves making mistakes, or at least doing things that seem really dumb in retrospect.
Learn as much as possible from your failures.
(Could you have found that bug earlier? Learned about that other technique earlier?)
Do not shame yourself for them.
Instead, admit them, and document this learning for yourself and others by talking about it or potentially adding some advice to the handbook.
Note that failing is not the same thing as not having the result you wanted---it's a good day when your hypothesis is not supported and you get to learn something about how the world works.
Frequently reevaluate your approach and the direction of each project, and take initiative in doing this.
Take initiative as a collaborator (middle author) too.
**Know your corner of the literature.**
It makes you much smarter and can save enormous time in the long run.
It also makes it easier to spot good opportunities and unanswered questions.
Knowing the history of work on your problem inside and out is a requirement for first authors.
Develop a scientific reading habit if you don't have one yet.
As a general guideline, on average, graduate researchers and postdocs should be reading five papers a week, and skimming more.
**Be open to collaboration, and respect your collaborators.**
Getting anything worthwhile done in research requires learning from others, including through papers, talks, and whiteboard time.
Be proactive in thinking about who might have relevant expertise.
Ask for help, give help, and carefully acknowledge the contributions of others.
Clarify expectations when you start on a project: make agreements explicit (and for important things, in writing), with expectations and timelines, and be reliable.
For instance, let your collaborators know when they can expect to hear from you with new results, drafts, etc.
These principles hold for interactions inside and outside the lab.
By default, you should think of yourself as a collaborator on every project in the lab, and remain engaged.
**Communicate assertively.**
It's nice to hear from other people that they've benefited from your analysis, timeliness, criticism, etc.
Tell people whenever you can that you like what they're doing.
Consider emailing strangers when you like their paper or talk, and explain why.
On the flip side, it's frustrating to learn from a third party that someone is unhappy with something we said or did.
Assertive communication means we give each other direct, constructive feedback if we think something isn't right.
You can trust me not to negatively describe your behavior to others without speaking to you first.
More broadly, if you think something can be improved, speak up to the right people before complaining to others.
Be constructive by criticizing the idea, analysis, or behavior, rather than the person.
Communicating assertively and kindly usually takes practice.
**Don't be too narrow.**
Take time to play intellectually.
Participate in departmental seminars, go to talks in other departments, meet with people who seem to be doing interesting things, and read exciting papers that might not be obviously related to your current projects.
(Graduate researchers should aim to attend a departmental seminar and journal club each week.)
Start journal clubs and groups that you wish existed and would make time for.
Try to balance exploration with work on existing projects.
I don't know the right balance here, but it's worth trying to figure it out.
**Work hard but sustainably.**
Figure out sustainable habits for effective research.
What is sustainable is personal: avoid blindly adopting others' criteria.
Focus on habits, such as working set hours each day, before benchmarks (e.g., "Publish", "Get speaking invitation", "Be famous!!!11!!" etc.).
Resist the temptation to run from one deadline to the next, and think instead about how to make regular progress.
(Do this especially if you're (i) fresh out of undergrad, or (ii) have never done it before because you think you work best under pressure.)
Aim for 40 hours of focused work per week.
If you're not happy with your progress or productivity, *there's no shame in asking for help or ideas from others*, including your advisor.
For what it's worth, I do think it's possible to do great research while having a life. (I've seen others do it... lolz)
A great resource for sustainable habits is the [National Center for Faculty Development and Diversity](https://www.facultydiversity.org/).
They have online classes, weekly newsletters, and writing support groups.
The University of Chicago has a subscription, so you should be able to get free access.
**Be accountable**.
All work in the lab is collaborative, involving the blood, sweat, and tears (and hopes and dreams!) of multiple people, and often indirectly taxpayers.
Respect all these contributions by being a reliable, involved scientist.
Let collaborators know if your contributions will be delayed, if you think something should be done differently, or if you have concerns about quality.
Always look for better hypotheses and approaches.
Speak up if you see something that's not right.
Remember we'll all be dead soon enough, and this is our opportunity to help others.
```{r danse-macabre, out.width='50%', echo=FALSE, fig.show='hold', fig.align="center", fig.cap='A contrasting time scale of research'}
knitr::include_graphics(rep("images/danse-macabre.png"))
```
## Work hours
I will not judge your progress based on the hours you keep in the lab: what matters most is that you make substantive progress on the lab's goals.
I encourage you to figure out how you work best.
You thus have broad freedom to choose how you work, provided you communicate your plans to others and see the projects through.
But because we're human, and it's nice to see other, non-digitized human faces regularly, aim to spend at least three days a week (for most of the workday) in lab.
You don't have to communicate your schedule with me in advance, as long as you're around roughly this much and show up for meetings.
Undergrads and out-of-state lab members will have different arrangements, of course.
Take vacations!
Take good ones, show us the photos, and bring back tea, chocolate, and/or fine liquor.
Try to use all your vacation time.
You don't need approval from anyone before you select dates, but please try to give collaborators advance notice and consider their schedules.
Mark the days you plan to be away on the lab calendar.
Please stay home if you're sick. You will be judged for this one.
Non-hourly, benefits-eligible employees (including graduate students) are entitled to various forms of parental, personal, and family leave, and I encourage you to take them if you need them.
If you need other accommodations, please let me know.
## Workspace
The lab space is supposed to help you work efficiently and happily: I want it to be a place where people can reliably go to get stuff done.
* Keep the main room quiet.
If others are present, have meetings (in person or via Skype) in an adjacent conference room.
* Please feel free to customize your space. Adjust the location and height of your desk and file cabinet as you see fit. Let me know if you'd like a privacy screen, a fan or space heater, etc.
* Check with others before bringing pets to work.
* Consult others before making dramatic changes to the lighting or temperature.
* Keep things clean. Wipe your desk. Wipe the kitchen counter. Do not wipe crumbs on the floor. Clean up spills. Gently encourage others to do the same. Bad habits kill mice.
* To make the room easier to clean, avoid letting your stuff overflow past your desk. Don't leave piles of stuff on the floor for more than a few days.
* Please let me know if there are ways the space can be more comfortable, or if there are particular things (e.g., new computer or software) that would improve your work.
* Lock the door if you leave and no one is in the lab.
## Communication
Please limit the use of email for research questions and discussions.
Use Asana instead---it makes things much easier to find in the long run, and it has no overeager spam filter.
Asana is also the best tool for lab announcements and discussions.
I do not expect you to check email or Asana on weekends, vacations, sick days, or holidays.
I'll always try to respond to your communications within 24 h, excluding weekends, vacations, sick days, and holidays.
In general, you should check email and Asana a few times a day and try to respond to urgent requests within an hour or two (M-F), but I expect urgent requests to be few and far between.
They will probably have some warning (e.g., an impending paper or grant deadline).
If I'm in my office and the door is open, you're welcome to come in to talk.
If the door is closed, it means I'm working, and it's best to communicate via Asana.
## Weekly check-ins
Most weeks, you'll meet individually with me to discuss research and occasionally other topics.
The agendas of these meetings is largely up to you.
However, I ask you to come prepared with slides, an updated summary, or notebook (Rmd, Jupyter, etc.) concisely describing progress since the last meeting.
You can choose whichever format works best for you, but somehow, your notes should clearly state (i) what goals you had set last week for this week, (ii) your progress on each goal, and (iii) what you think comes next.
(This structure is really helpful for me.)
You should also be prepared to show system and unit tests, or some kind of validation, to convince me your research results are correct.
The meeting is time to both dig into the weeds but also think about the big picture.
I'll try to remind you of this, but one of my main goals for each meeting is to learn how I can best support you, during the meeting and after.
If our relationship needs adjusting, please let me know.
If you want to talk about career goals, that's fine too.
This is your time, but please be organized about it.
When I'm traveling, these weekly meetings may need to be rescheduled or occur over the phone.
Please always feel free to request a meeting when I am traveling, unless I'm on vacation.
## Lab meetings
We'll meet weekly as a lab.
Meetings start with various announcements of abstract deadlines, cool upcoming talks, new papers, etc., and then we briefly update one another on our research (roughly 1-5 min per person).
The point of these updates is to practice describing our research and especially to keep each other involved in our work, which includes providing helpful suggestions.
The rest of the meeting is usually dedicated to an in-depth presentation and discussion of one of our research projects or a discussion of a paper.
Plan to present your research once per quarter and to lead at least one paper discussion per quarter.
Pick papers at least a week in advance so people have time to read them.
Everyone is expected to show up having read and critically thought about the paper.
If someone is presenting, everyone else is expected to make helpful suggestions.
## Daily "Standup"
We use a Slack app to do quick check-ins on a daily basis. In the morning, everyone receives a prompt to list the work they're planning for the day. Everyone's activities then appear at 10 AM Central for others to read. Checking in is optional and skipping the prompt is acceptable with or without explanation. We use this to start conversations, and to build the habit of formulating our intentions in advance.
## Reproducible research
You have broad freedom in most aspects of how you work, but there are certain protocols we follow to keep our work reproducible, accessible, and organized.
**Reproducible** means that other researchers could use your notes and code to reconstruct your results precisely without guesswork or manual labor. All of the figures and results in any manuscript must be fully reproducible by executing one or a few scripts in a public github repository. It should also be easy to reproduce intermediate results during development. Basically, this means we use version control, [git](https://git-scm.com/book/en/v1/Getting-Started-About-Version-Control).
**Accessible** means that (i) all of your code, including small scripts, is maintained in a git repository that is regularly (e.g., daily) synced to the lab's github account; (ii) all raw data and major results are stored on Midway projects/cobey (unless other arrangements are required by IRB); (iii) project management is visible to all lab members on Asana; and (iv) you regularly back up your laptop using an external hard drive *and* CrashPlan Pro.
**Organized** means that you keep your project files organized, use version control, document your code, include unit and system tests, use Asana and/or notebooks to record all decisions in your analysis from day to day and week to week, and you refactor code when it stinks. It also means you communicate progress promptly to collaborators in meetings and (for external collaborators) emails.
Specific suggestions are in Section \@ref(so-you-wanna).
<!--chapter:end:02-how.Rmd-->
# Performance
## Reviews
There should be informal reviews at every weekly meeting.
The point is mutual feedback, i.e., we can discuss your progress and develop achievable goals for the next few days to years, and you can tell me if there are areas in which I can provide better help.
If you would like more formal progress reviews, let me know.
Please also let me know if you are ever worried about your progress.
Postdocs and salaried researchers will get formal annual reviews as required by the Biological Sciences Division, but they are really secondary to the regular meetings (i.e., there should be no surprises).
## My commitments
* Meet with you at least biweekly (usually weekly) to discuss research and professional opportunities.
* Help give you an accelerated introduction to the field.
* Provide rapid (within days) critical feedback on research ideas and drafts---*with advance notice!*
* Help you establish relationships with other scientists in field.
* Promote your work in conferences and meetings.
* Help you attend conferences and meetings.
* Help identify areas of professional growth.
* Provide teaching and mentoring opportunities, if desired.
* Fund you for at least $n$ years (as agreed), assuming steady progress.
* Be a trustworthy, reliable, honest, hard-working, constructive, respectful, and communicative colleague.
* (For postdocs) Help you identify a line of research to continue when you leave the lab.
## Basic expectations
* Follow the lab's principles (Section \@ref(principles)) and all our described work practices, including the project management and programming techniques described in Section \@ref(so-you-wanna).
* Take full intellectual ownership of your research, i.e., think hard about whether you and your collaborators are doing the right thing, search for relevant papers, and push your projects forward at a good pace.
* Develop annual and long-term professional goals as soon as you join the lab, and discuss them with me then and regularly thereafter. Let me know whenever yours goals change. (It's okay if your long-term goals are amorphous, just let me know.)
* Work steadily, understand how you work, and let me know how I can help you work better. (See Philip Guo's [list of performance bounds](http://www.pgbovine.net/human-bounds.htm) for examples.)
Please let me know especially if my availability, the environment, software, or hardware are slowing you down.
* Learn from your mistakes. Programming bugs, bad writing, awkward slides, undiscovered papers, are all an unfortunate part of research. Forgive yourself *and* take corrective action to reduce the error rate in the future. Of course, the optimal error rate is usually not zero... The only *real* mistakes are blowing off or ignoring what people (reviewers, coauthors, committee members, me, etc.) are telling you, ignoring data related to your research or performance, and being a jerk.
* Perform lab service, as agreed upon (e.g., maintain the lab calendar, order office supplies, water the plants).
## Graduate researchers
I've listed below the skills I think graduate researchers should have by the time they defend their PhD.
Your adviser and committee will help steer you, but you are in control.
(I kind of dislike the "student" convention, tbh. You're scientists, just less experienced ones.)
* *Intellectual independence and mastery*
* Be able to define a coherent field of study, including the progress that has been made in it and the problems that remain.
This requires following the literature by regular, self-directed reading (at least five papers a week, conservatively, on average).
* Have enough statistical and general knowledge to assess the strength of evidence of (almost) any study or general claim in this field
* Propose and carry out tractable, meaningful studies
* Identify new questions you want to answer and have an idea of how to address them
* Have a demonstrated history of acquiring skills through self-driven instruction and self-initiated collaborations
* *Intellectual contributions*
* Publish at least two papers on which you're first author.
These papers should be submitted by the time you defend. Note, this is not the requirement of the UChicago MSTP or E&E programs, but I think it is an important minimal target.
* Give talks outside the department and handle questions about your work.
* Collaborate on projects on which you're not the first author.
* Ask public questions during conference talks and seminars.
* *Toughness*
* Practice feeling clueless regularly and getting over it, especially through learning.
* Adapt projects to deal with unexpected outcomes.
* Learn how to handle diverse forms of criticism and professional conflict.
* *Service*
* Be able to criticize constructively in any situation.
* After publishing, start reviewing manuscripts for journals.
* Understand social and political context for scientific research.
* Practice sharing your work with broader audiences, e.g., via blog posts, talks to the public, and interviews.
* Seek funding opportunities and apply for grants.
* Establish your reliability in communicating with committees, collaborators, and administrators in a timely, respectful way, and follow through on your commitments.
Please note that you are ultimately responsible for ensuring you are meeting the requirements for your degree. The Student Advisory Committee (SAC) and later your thesis committee will help you with the planning, but you should take initiative in scheduling and planning ahead.
## Postdoctoral researchers
Generally, postdocs should have facility with all of the skills listed above and
* Develop new research projects and manage external collaborations.
* Drive projects forward in consideration of existing and ongoing research in the field.
* As negotiated, potentially take a major role in managing research performed under federal contracts, including the completion of monthly progress reports.
* Mentor junior researchers. This can be formal, but postdocs should also be providing more constructive comments across the board to other researchers, including junior ones.
* Teach selectively, if interested in a teaching position.
<!--chapter:end:03-performance.Rmd-->
# So you wanna...
## Join the lab
We welcome applications from skilled, ambitious, and independent researchers at all levels, as long as they are burning to do good research promptly.
**Undergraduate students** interested in joining the lab generally need to be proficient (not brilliant) in at least one programming language, such as Python, R, Matlab, C(++), or Java, and have some biological background or curiosity in at least one research area. You should also be proficient in basic statistics.
**Prospective graduate students** are encouraged to review the details of the graduate program and the research described here. They may wish to consider working as research assistants in the lab to ensure a good fit before applying to the graduate program.
**Rotation students** must start with strong quantitative and some programming skills.
**Senior researchers, research programmers, and postdoctoral fellows** are also welcome to contact Sarah about opportunities for support and collaboration. We are especially looking for more postdocs to study the evolutionary dynamics of adaptive immunity.
All who are interested in joining the lab should explain in their initial communication what skills they could bring to the lab and what they hope to obtain from collaborating. It is essential to have read recent papers in the relevant research area, including some from our group, and to have an idea of the kind of questions or problems that excite you.
## Do some research
```{r owl-plot, out.width='50%', echo=FALSE, fig.show='hold', fig.align="center", fig.cap='Basically, except less sequential'}
knitr::include_graphics(rep("images/owl.png"))
```
**Identify a good question:** This can take some time. Talk with other people, read, keep talking, study patterns, reason from first principles, and keep talking. What phenomena are you trying to explain? Generate many questions. Consider the next step, if necessary, in picking one.
**Develop a game plan:** Posit some answers to your question. What do those answers imply? What patterns or processes are (in)consistent with them? How can you test them? (And how can you make sure you're testing them correctly, i.e., that your analysis is correct?) Draft some approaches. Prioritize a few. Add the actionable tasks to Asana. (You can keep backup ideas there too in a separate section.) Read up on [project management](http://thenewpi.blogspot.com/2018/03/why-you-should-care-about-project.html).
**Set up a notebook:** Create a version-controlled "lab notebook" in which to record your progress, which includes your thinking, notes from papers, and your analyses. There are many fine ways to do this: what's most important is that the notebook is organized and that you use it. You could use Asana and Overleaf (Latex), an R markdown file, or a Jupyter notebook. The latter two are probably most seamless, but it doesn't matter too much. Be sure to keep files (notes, data, scripts) in a repository, synced to the lab's github account. Everything you do should be traceable and reproducible in some way---no quick "one off" figures that exist only on your laptop.
**Understand context and constraints.** If you're working with data, there might be IRB restrictions on how it can be used, stored, and shared. Find out and comply. Also ask how the work is funded, if you do not yet know, and what kinds of reporting requirements and deadlines we may have. Contracts often require monthly progress reports; those for grants are less frequent. Identify any collaborators and make a plan for working with them
**Have the right attitude:** As long as you're reasoning based on evidence, you're making progress. See [Schwartz (2015)](http://jcs.biologists.org/content/128/15/2745). Not all projects should move ahead. This is why it's useful to step back, reassess, and discuss your work with others. Revisit and revise your previous questions.
## Code well and efficiently
See the [coding handbook](coding-handbook.html#coding-handbook).
## Write good
### General advice
* One of the best ways to write good papers is to read lots of good papers.
This is more comfortable than learning incrementally from rejections. Also, there are useful books and essays on the subject (see below).
* For grants and papers, the central challenge is to articulate an interesting question and show how you have helped or will help answer it. Practice doing this from the beginning of your research project, as you sketch ideas and results in your notebook.
* **Be clear.** Use topic sentences. Assume your reader is an intelligent first-year graduate student, but with less time on her hands. Try to state things vividly and directly. Your writing will almost always improve if you try to explain your reasoning as transparently as possible.
* Focus on ideas, not people or studies. Avoid "Many studies have shown X." Just state what has been shown and give references.
* Be consistent. If you define a parameter, refer to it the same way throughout your paper. This holds for all sorts of annoying punctuation and formatting conventions. Channel the reader's attention into one clear, fascinating story, and let nothing distract from it.
* I like Claus Wilke's advice about knowing, when you sit down to write, if you are drafting or revising/editing. If you are drafting, don't worry so much about the flow. Just get the ideas down. Under no circumstances should you show me that draft, however.
* Recommendations: ["Why Academics Stink at Writing"](https://stevenpinker.com/files/pinker/files/why_academics_stink_at_writing.pdf), [*The Elements of Style*](https://www.bartleby.com/141/index.html) (which I really like, contrary to [Claus](https://serialmentor.com/blog/2017/11/12/move-over-Strunk-White)), ["Ten Simple Rules for Writing Research Papers"](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003453), [Sarnecka Lab's "Writing Workshop"](https://sarneckalab.blogspot.com/2018/07/writing-workshop-table-of-contents.html)
### Initial submissions
The following workflow seems to work well, but if you prefer another one, let me know.
1. Start yesterday by writing summaries of the research in your lab notebook as you go. You can probably write most of the methods and many results this way. You'll also accumulate the major points for the intro and discussion in your notes too.
1. Identify a target journal in consultation with me and coauthors. Check the journal's instructions for authors so you know how to structure your manuscript.
2. When it is time to write the manuscript, set up a repository synced to Overleaf. Draft the figures and abstract first. Discuss the arc with me and potentially other coauthors. Sign up for a future lab meeting or theory meeting, in which we'll discuss the complete draft (which you'll need to distribute a week in advance). Propose an ordered author list and discuss with me.
3. Next write an outline---with the main steps in the argument as topic sentences, so it's really possible to trace the whole story---with each result in its own subsection and figures inline.
4. Draft the results section. Ask another person in the lab for feedback and then revise, and then show me. *N.B. I have a lamentably hard time looking past poor construction and focusing on the ideas, so I appreciate it when the writing is clear, organized, and not too laden with typos, even at these early stages.*
5. Next write the methods, introduction, and discussion, and revise the abstract. Do not forget to acknowledge Midway (assuming you used it) and funders. Funders often require specific language. Investigate.
6. Ask for more feedback from a labmate. Revise. Share with me.
7. We'll then discuss the manuscript in a lab meeting, and revise. Now is a good time to propose to the coauthors a schedule for your sending and their reviewing a draft.
9. We only send drafts to coauthors and "friendly reviewers" when the writing is coherent and flows well. We do not want to waste their time.
10. Ensure your repository is well documented and up to date, and that all the analysis---including the figures---can be run from the included code with minimal effort. We make the repository public when we submit the manuscript to a preprint server. Often it's useful to start a new repository than the one in which you developed the project. Double check that you are not sharing protected data.
11. Ask a colleague to try to run your code, given the manuscript and access to the repository, and no other help from you.
12. When the final manuscript is ready for coauthors' approval, confirm with them (if you've not yet) their funding and preferred and non-preferred reviewers. Make sure they don't have a COI with any of the preferred reviewers.
13. Draft the cover letter, or whatever introductory text the journal requires.
14. When all coauthors have approved the manuscript, we submit it to a preprint server (without journal-specific typesetting), publish the repository, and submit to a journal.
15. Celebrate.
### "Mature" papers
1. Ideally at least 20 min have passed before we receive a decision from the journal. (That's the fastest rejection I've heard someone receiving from *Nature*.) If we receive a rejection without review (a "desk rejection"), we probably need to improve our abstract, introduction, and/or cover letter. If we're rejected after review, we'll take 2-3 days to consider the reviews and make a plan for revisions. Be sure to communicate decisions to coauthors, if the journal does not email them automatically, and to let the coauthors know the plan for responding to the decision.
2. If the journal requests a revision, we track changes (using colored text) and write a polite and succinct reply. I like replies that are as self-contained as possible, so that months after having done the initial review, the reviewer can read over the reply and be sufficiently reminded of the context for each criticism that she or he doesn't need to reread the whole paper. In the reply, we describe exactly what has been changed in the paper and quote the changed text (with as much context as necessary, including corresponding line numbers). Ask other lab members for sample replies to get a sense of the tone and format.
2. It is to our advantage to revise quickly. With greater distance from the paper, reviewers can easily take issue with new parts. Novelty tends to decrease with time, even if nothing has been published in the interim.
3. Ask coauthors in advance if they could inspect revisions and the reply during some time window---they should know when to expect our draft.
4. When we resubmit to the journal, we *immediately* upload the revised manuscript (without journal-specific typesetting) to the preprint server. Once the manuscript is accepted, it's usually too late to submit a revision. We would then have to decide between paying exorbitant OA fees to the journal or leaving the article behind the journal's paywall, thereby encouraging people to access the outdated version. It's best if the accepted version is already on the server.
5. Remember that rejections and revisions are [part of the game](https://twitter.com/dsquintana/status/1053898526667739136).
## Email like a pro
* **Be concise.**
Be clear if you are asking for something, or if you are simply giving information.
Try to minimize the number of back-and-forths required: instead of asking if someone is free to meet next week, list blocks of time you are available and propose a location.
Make it easy to reply quickly.
* Rather than send large attachments, **send a link to a Box or Dropbox file**.
* **Be polite**. Being concise is part of being polite, but being polite also means using professional titles and spell-checking your email.
Striking the right tone can be hard sometimes.
One error that very junior scientists sometimes make is being excessively deferential ("I was wondering maybe if you might consider...").
## Make nice figures
Mostly, see Claus Wilke's excellent online guide, [Data Visualization](https://serialmentor.com/dataviz/).
Some immediate suggestions:
* Label your axes. All parameters should be spelled out and accompanied by their symbol (e.g., "Transmission rate, $\beta$").
* Save figures in vector formats.
* If many points are plotted over one another, consider semi-transparency or plotting densities.
* Minimize wasted space in figures while ensuring your axis limits are appropriate (e.g., that fractions ranging from 0 to 1 have y-axis ranges of [0,1]).
* Titles are frequently a waste of space, but be sure to include key information somewhere nearby, e.g., what the shaded area represents, how you assessed significance, etc.
* To increase accessibility, avoid relying on red/green contrasts.
## Keep up with the literature
Keeping up with the literature involves two challenges: finding papers and reading them. I've described what I do, but there could be better solutions.
**Finding papers**:
1. Use a RSS reader. Subscribe to major journals and bioRxiv and arXiv topics. Skim the titles and abstracts when you have a few minutes here and there. This will probably identify 90% of the *new* papers you might care about.
2. Set up Google alerts so you can get emails when particular papers and people are cited.
3. Do a good search in any area of interest so you can identify relevant papers published decades ago. It is amazing how rapidly phenomenal work can be forgotten by a field.
**Reading papers**:
1. Just block off the time on your calendar and do it.
2. Consider setting up a small reading group to discuss more challenging papers. (For easier papers, this can double the time it takes to read them.) Also take advantage of theory group and lab meetings to force these discussions.
## Get funded
1. Aggressively search for opportunities. Ask the grad student and postdoc coordinators, ask your peers, do searches, etc. Periodically recheck.
2. Work backwards from deadlines, giving yourself much more time than seems necessary, and establish a schedule. Considerations: (i) Ask Linda if the grant will need to be reviewed by the **URA** and find out what their deadline is. This is the effective deadline (it's usually about two weeks before the submission deadline). (ii) **Letter writers** usually need at least three weeks before then, and it is best if you can give them a good copy of your research proposal by then too. (iii) Depending on the complexity of the application, we may need four or more weeks to **bounce drafts back and forth**. (iv) For applications with mentoring plans, it's especially useful to get started several months ahead, so we can identify if **another sponsor** should be brought on board. (v) It's also important to start early if we're unsure the research will be a **good fit** for the funder: we need the time to revise our pitch in coordination with the program officer. (vi) Many grants require **preliminary data**, and it's good to figure out what that should be early.
3. Identify several people who will read a draft of your proposal. It's best if some of them have been successfully funded and if they are not in your subfield. (Ideally, they'd be just like the review panel.) Ask how long they'll need with the proposal to give you comments, and work backwards to figure out when your draft needs to be ready.
4. Obtain copies of as many successfully funded applications of this type as you can, ideally with their summary sheets or reviewer comments.
Promise confidentiality.
You can search for funded federal grants on the [NIH RePORTER website](https://federalreporter.nih.gov/).
It's also worthwhile just asking around.
4. Read *[4 Steps to Funding](https://www.amazon.com/Funding-Rejection-Funded-Simple-Formula/dp/0615505589)* in its entirety before drafting anything.
It should only take a few hours.
We have a copy in the lab somewhere.
5. Study the call/grant description carefully and study the funded applications.
What consistencies appear?
Potentially consult with program officers and other applicants to make sure you understand what reviewers are looking for.
6. Write the proposal, and get funded! No seriously, we'll discuss proposal-specific details in person. But as a mentor once told me, people generally like things in proportion to how well they understand them, so you want to make sure the proposal is really exciting---see *4 Steps to Funding*---and really, really clear. This is why we ask people outside our subfield to give us comments.
## See our funding
We keep copies of funded and unfunded grants in the "Grant proposals" project on Asana.
*Assume these grants are confidential; do not share them outside the lab.*
Feel free to ask me about them if you have questions, and if you write a fellowship proposal, please add your submitted proposal (excluding the budget) to the project.
## Review your peers
I'm assuming you've already been invited to review a paper.
If you've not, there's not too much you can do, aside from publishing.
If you make positive comments about unpublished work at a meeting, there's a chance the authors will suggest you as a reviewer.
If you make smart comments, there's a chance editors will notice.
Rest assured I'm always on the lookout for papers I can invite lab members to review with me or in my place.
If you've been asked to review a paper,
* Ensure you do not have a conflict of interest. Check the journal's policy, but regardless, look in your heart of hearts, and do not overestimate your impartiality. You should decline to review papers by friends and recent collaborators. I also decline to read manuscripts that seem to be directly "competing" with my own, i.e., they are tackling the same question using similar methods. They're probably not really competing, but if I feel they are, that's enough to disqualify me.
* Confirm you can make the deadline. If you need more time, it's better to ask the editor for extra time now.
* It might be a good idea to ask if they want your review as a backup. It seems rude to ask and probably usually is, but I've twice agreed to review manuscripts for journals only to be told *after I'd agreed but before the deadline* that they had received a sufficient number of reviews and no longer needed mine. This doesn't seem polite except under extraordinary circumstances and wasted hours of my time.
* Read over the journal's criteria for judging manuscripts. For some journals, novelty is unimportant, or there's no requirement to work with empirical data. It's really annoying to be held to standards that the journal itself does not endorse; the editors often don't appear to recognize when this is happening. (Nope, no baggage here!)
* Start your reviews with a succinct summary of the manuscript, placing it in the context of other work in the field. This helps the editor, who might not understand the paper so well, and also shows that you understand the paper. Directly discuss what the paper contributes or could contribute and the extent to which the paper satisfies the criteria important to the journal.
* Next review the major strengths (sic) and weaknesses of the paper. Be very clear about what makes and/or doesn't make the paper convincing.
* Give evidence for your views. Especially regarding claims of novelty, cite! One of the most maddening things is to get a review saying, "Yawn, this has basically been done before," with no references. Citations also help the authors improve their work quickly, especially if you're suggesting relevant papers they've missed.
* In general, do not punish the authors for not doing the study you would've done or think should be done. Focus on what the paper *does* contribute or could contribute with minor or moderate changes.
* Do not recommend acceptance, major revisions, minor revisions, etc., directly in the body of the review. That recommendation is for the editor to make. Your job is to help the editor make a decision and the authors to understand your impression of their paper---both what's good about it and what can be improved.
* Be constructive. *Never* be snarky or sarcastic. Imagine this is the first review the first author is receiving, or that the authors have feelings.
* Let the editor know in your review or confidential comments if you do not feel qualified to judge certain parts of the manuscript. It is okay to state this in the main body of your review too. Just remember you have a positive duty to disclose.
* Especially for papers that need a lot of work, it's not a good use of time to note every small mistake. You do not need to be the copy editor. If there are small technical mistakes, e.g., the lines on a graph are switched or the notation is messed up, put them in a section for minor comments.
* For papers involving code, try in 10 min to run the code and check its documentation, but do not reimplement the analysis unless you want to. You also do not need to check complex mathematical derivations. However, the methods should be clear and completely reproducible from the content of the manuscript. (I am not a fan of "See previous papers $X_{t_1}$, $X_{t_2}$,...", though a bit is okay.)
* It's fine, even good, to comment on other reviewers' comments in later rounds of review, especially if you disagree with them. If you think a reviewer has made a major error, email the editor.
* Try to limit your likeliy biases in peer review. People often favor manuscripts by authors of the same gender and nationality ([Murray et al. 2018](https://www.biorxiv.org/content/biorxiv/early/2018/08/29/400515.full.pdf)). (There are also biases in the selection of reviewers; [Helmer et al. 2017](https://elifesciences.org/articles/21718).)
## Have productive meetings
### Research meetings
**Before the meeting:**
1. Make sure every meeting has a purpose that everyone understands. It is good if you can send an agenda beforehand. Some people also like to review materials, such as summaries, beforehand. Ask they want this.
2. If proposing an ad hoc meeting, give an estimate of how much time it should take in your invitation. This will help people focus during the meeting.
3. If the meeting is routine, let the other participants know in advance if you expect it to be especially short or long.
4. I dislike meeting reminders, but some people need them. Find out.
**At the meeting:**
1. Quickly review the agenda and the meeting's purpose.
2. If discussing research, ensure you give appropriate background information and context, and ensure your figures and numbers are clear (even if they're not "pretty").
3. Take notes rather than forcing yourself to recall things later.
4. End by summarizing the next steps, responsibilities, and timeline.
5. Keep the meeting on track: If a less relevant discussion takes off, flag this as a topic to address later.
**After the meeting:**
1. For committee meetings, meetings with collaborators, etc., send a short follow-up email summarizing what was decided and what will happen next.
For meetings with me, tasks should be updated in Asana, and additional notes can be there or your lab notebook.
2. Follow up on those commitments.
### Seminar speakers
Try to meet with seminar speakers who do relevant work.
This is just fun, and it also helps people get to know you and your research.
Think of their visits like an intermittent conference without the annoying travel.
Prepare for the meetings by reading at least one of their papers, skimming their other works, and writing a list of questions that would be fun to discuss.
If you're having trouble getting on the schedule, let me know.
## Book travel
The general idea is to reduce costs as much as possible while remaining comfortable and productive.
(These savings will go toward more travel, research, and fun lab things.)
The grants that fund travel have different allowable expenses and documentation requirements, so please check flights and your total budget with me before booking.
Guidelines:
* Imagine the money as your own. Please plan your travel far enough in advance that we are not paying through the nose for registration or flights.
Please book flights at least six weeks in advance, unless you're really confident the price is dropping.
For conferences, book flights earlier.
* UChicago has discounts with various airlines, hotels, etc. [Check them](http://finserv.uchicago.edu/purchasing/travel/index.shtml).
You may need to use the University's travel agency or use a special website (e.g., swabiz.com for Southwest).
(Some of these "deals" should probably be checked against Hotwire or Priceline.)
* Bonnie can book the flight for you so you do not have to pay and then be reimbursed. If you pay upfront yourself, you will have to wait until after the travel is complete to be reimbursed.
* If you book an atypical flight, such as something arriving a few days early or leaving a few days late, or that includes personal travel, funding groups generally require that you also include a quote, *obtained at the time of booking*, of the cost of the flight for typical (business-only) travel.
They'll only reimburse up to this amount. But it's otherwise totally fine to include personal travel with business, as long as you document carefully.
* If your travel is funded by the federal government, you generally have to fly with a U.S. carrier or book your ticket through that carrier.
* You are not required to share hotel rooms or use Airbnb, but if you do, it's appreciated.
* It's also great if you can share rides/taxis and take public transit, but you're not expected to go to great inconvenience to save money.
* The University will reimburse only original itemized receipts; it will not give you a per diem. Food costs are reimbursable up to federal rates if covered by a federal grant, or $100 if from a non-grant account, per University policy. I think the principled thing to do is only submit receipts for food costs above what you'd normally spend (and not to go crazy with spending). Note the receipts must be itemized, and alcohol cannot appear on the receipts of NIH-funded travel.
* Internet charges cannot be expensed to federal grants.
* Submit receipts to Bonnie within one month of travel.
* I suggest you sign up for airline loyalty programs if you haven't yet.
Southwest is pretty good: Any flight can be paid for in points (miles), in that there are no annoying blackout dates or hoops to jump through, and you can change flights without a fee.
American Airlines is basically the worst.
## Be happy doing research
> But I am very poorly today & very stupid & hate everybody & everything. One lives only to make blunders.— I am going to write a little Book for Murray on orchids & today I hate them worse than everything so farewell & in a sweet frame of mind, I am
>
> Ever yours
>
> C. Darwin
If you're excited to solve the problems you're working on and to communicate them to the world, research is great.
Sometimes things can get in the way.
Major obstacles and tips to avoid them are below.
If you think something is missing from this section, please let me know or add it.
### Time management
A critical skill is to identify your priorities, understand how you work, and learn how to allocate your effort to get the most important things done and avoid overburdening yourself.
The [National Center for Faculty Development and Diversity](https://www.facultydiversity.org/) has excellent materials, designed for faculty but relevant for everyone, on helping you use your time well.
You should be able to get free access to the videos and tools through the University.
If you're regularly feeling overwhelmed by tasks or unhappy with your progress, this is also something we can discuss at weekly meetings.
(Full disclosure: I'm far from perfect and perennially trying to improve in this area.)
As mentioned, I think 40 h of carefully chosen, focused work per week is enough to get things done.
Practical suggestions:
* I use the [Freedom](https://freedom.to/) app and [Pomodoro Technique](https://en.wikipedia.org/wiki/Pomodoro_Technique) when I'm having trouble focusing or really resisting some task. Often when I'm resisting a task there's some emotion behind it (e.g., not wanting to be bored), and recognizing that emotion and setting a timer (I can handle 20 min of boredom) helps me avoid procrastination.
* Every important task or task >5 min goes into Asana and immediately placed on my calendar. This helps things get real. It's harder to be deluded about how much time I have.
* I find it useful to compare my scheduled day to how I actually spent my time. It has made me realize the necessity of adding a bit of extra time for spontaneous meetings, scheduling brainless tasks after teaching, etc.
* I also give myself weekly and quarterly goals, and do the same comparisons.
* Working or accountability groups/buddies can be great. If you know others who are struggling to read, write, etc., regularly, consider setting up formal work sessions or accountability reports. The buddies don't have to be local.
### Imposter syndrome
It's really common and can't completely be cured.
My best advice is to practice acknowledging doubt and then moving on to whatever you want to do.
(I like [this post](https://psycgirl.wordpress.com/2016/07/22/the-tale-of-the-unwritten-manuscript/) from psycgirl.)
Because research involves working on unsolved problems with an ever-expanding set of tools, and the world is complicated, we have to be comfortable pushing through the discomfort of ignorance and mediocrity ([Schwartz 2008](http://jcs.biologists.org/content/121/11/1771)).
Once you accept this, it can mostly be fun to work on interesting problems with great people.
In general, you should not conflate your *or anyone else's* confidence and competence (see the [Dunning-Kruger effect](https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect)).
This is important in making science more equitable.
### Mental health and medical problems
Please take a "mental health day" if you need one, and if you feel stuck, I encourage you to consider talking to a therapist.
If you are new to therapy, keep in mind that there's enormous variation in quality and style between therapists.
If you don't feel like you're making progress with a therapist, find a new one.
Remember that many University health plans cover therapists who are not on campus.
Of course, there are a bazillion ways to maintain our mental health, and I encourage you to develop [multiple strategies](http://drtregoning.blogspot.com/2015/05/using-pop-songs-to-maintain-good-mental.html) if you're feeling strained.
If you're not getting good medical care or you have a condition that interferes with your work, please let me know so we can find better care and/or accommodate your needs.
### Unwelcome environment
If your environment is creating difficulty, e.g., the University or lab does not feel like a welcoming place, or you are under great financial strain, please let me know.
Your workplace should obviously be supporting you.
A note to students: Under [Title IX](https://voices.uchicago.edu/equity/title-ix/), if you speak with me about sexual misconduct, I am required to talk to the Title IX Coordinator about it. The University may proceed with an investigation (potentially despite your wishes). If you wish to maintain confidentiality, there are a [variety of people](https://voices.uchicago.edu/equity/title-ix/confidential-resources/) you can talk to.
## Give a good talk
I like this [advice from Jonathan Shewchuk](https://people.eecs.berkeley.edu/~jrs/speaking.html).
If you're scared to get started, read [Tim Urban's](https://waitbutwhy.com/2016/03/doing-a-ted-talk-the-full-story.html) essay about his TED talk for inspiration.
If it's the first time you've given the talk, sign up on the lab calendar to give a practice presentation at least a week before the talk itself.
## Interview someone
We have frequent opportunities to meet with potential hires, especially prospective graduate students, postdocs, and faculty.
These are usually great opportunities to talk research with someone new, but they're also critically important from an institutional perspective.
You play an essential role in helping identify top performers.
We want colleagues who will not only do great work but who will make the lab, department, etc., a more invigorating place to be.
Please take advantage of opportunities to meet with people and learn about what they do.
You can take it as read that whoever is doing the hiring wants feedback promptly: try to provide feedback (in person, over the phone, or in writing) quickly, e.g., <24 h.
Please also let me know if you think particularly great people are on the market!
It's vitally important to decide *before* we meet a potential hire what criteria we should use to judge them: studies show that we often subconsciously rationalize biases by identifying criteria post hoc.
I think academics in particular are prone to discrimination because their self-identity is so predicated on objectivity.
It is useful to have a core, fixed set of questions or topics to discuss with different candidates.
This doesn't mean the conversation can't wander, but it promotes fair comparisons.
I'm happy to talk about the criteria I use for different positions.
Please keep in mind that many laws exist to protect people from discrimination, and they affect what potential employers can and cannot ask interviewees.
Even though you might not have hiring authority, as a representative of the University, you should avoid asking these questions too, even indirectly.
(You might not have any intent to discriminate, but the questions could rattle the candidate, and others who hear might be inappropriately influenced.)
Do not ask questions about race, color, national identity, or citizenship; religion; sex, gender identity, or sexual orientation; pregnancy status, marital status, or parenthood status; disability; and age.
For instance, do not ask about what languages someone speaks (unless it is somehow relevant to the position, which it usually isn't), their accent, where their parents were born, or their partner's job, or the existence of a partner.
It is especially inappropriate to bring up two-body issues when discussing candidates until an offer has been made.
Questions related to economic status (e.g., car or home ownership, debt, etc.) are also unwise.
The basic principle is equity.
Equity is a moral imperative.
It also has the handy feature of broadening the talent pool for any position and probably accelerating the pace of science.
From this principle, it follows we should not discriminate or draw conclusions about scientific and professional merit based on a huge class of dumb things, like whether someone wears makeup, seems really excited about sports, programmed in Fortran at age 3, knows your friends, drinks socially, etc.
We should make an effort to work well with people who are different from us.
I have heard almost every type of inappropriate interview question in academia.
It's pretty sad.
If you hear someone asking one of these questions, do your part by telling the candidate they don't need to answer and/or immediately changing the topic of conversation.
Candidates will often answer these questions anyway or even volunteer protected information on their own.
Do not follow up, and attempt not to be influenced by the information.
## Contribute to the handbook
I'd love to make the handbook as useful as possible.
Please contribute if you see ways to improve it (especially if you have css skills!).
The handbook repository is in the lab's github account.
Submit your changes as a pull request.
If you'd like to make many contributions, let me know, and I'll add you as an owner.
## Win at conferences
There are a bazillion resources on this. I think they boil down to
* Try to develop a list of people you'd like to talk to before you go, and have an idea of what you'd like to discuss with them. It can sometimes help to send an email in advance if there's someone you really want to connect with. You can set up a time and place to meet.
* Pace yourself. Get sleep. Go to talks, but not necessarily all of them. Make time for dinner and socializing. Ask people to join you for dinner.
* Practice asking speakers questions.
* Get in the habit of introducing yourself, asking people about their research, etc.
* Avoid spending much time with lab members. Really, the opportunity cost of hanging out with lab members is big. Meeting new people might not always feel like it amounts to much, but it will pay big dividends, I promise. You'll see many of them again. You'll probably collaborate with a few.
## Negotiate authorship
We try to follow the [APA guidelines](https://www.apa.org/research/responsible/publication/) for determining authorship:
> Authorship credit should reflect the individual's contribution to the study. An author is considered anyone involved with initial research design, data collection and analysis, manuscript drafting, and final approval. However, the following do not necessarily qualify for authorship: providing funding or resources, mentorship, or contributing research but not helping with the publication itself. **The primary author assumes responsibility for the publication, making sure that the data are accurate, that all deserving authors have been credited, that all authors have given their approval to the final draft; and handles responses to inquiries after the manuscript is published.**
(Emphasis mine.)
Not everyone we work with follows these guidelines, and they can differ from journals' policies.
We'll talk about it.
Authorship frequently needs to be [renegotiated](https://www.apa.org/science/about/psa/2015/06/determining-authorship.aspx).
It's better not to postpone this.
Please talk to me if you are unclear about authorship.
In general, I expect first authors to be corresponding authors, unless they want to pass responsibility for future communication to me (or whoever's the senior author).
## Engage with the public
**Locally:** The University has established relationships with local schools through the [Neighborhood Schools Program](https://nsp.uchicago.edu/) and with the community through the [Office of Civic Engagement](https://civicengagement.uchicago.edu/education/tutoring-enrichment/).
We also sometimes talk with local journalists and radio hosts (e.g., on WBEZ).
Let me know if you think there's something we should share.
**And beyond:** If you're interesting in educational outreach, consider [Skype a Scientist](https://www.skypeascientist.com/). If writing is more your thing, check out the [OpEd Project](https://www.theopedproject.org/).
<!--chapter:end:04-so_you_wanna.Rmd-->
# Coding Handbook
## Justification
The practice of science requires special care to ensure integrity. Not only do we want to know our results are correct, we need to show outside collaborators, institutions, publishers, and funders. The standards for these groups are also rising, particularly in the areas of data and code. Excerpts from the Fostering Integrity in Research (2018) checklists for researchers, journals, and research sponsors:
Researchers:
* Develop data management and sharing plan at the outset of a project.
* Incorporate appropriate data management expertise in the project team.
* Understand and follow data collection, management, and sharing standards, policies, and regulations of the discipline, institution, funder, journal, and relevant government agencies.
Journals:
* Provide a link to data and code that support articles, and facilitate long-term access.
* Require full descriptions of methods in method sections or electronic supplements.
Research sponsors:
* Develop data and code access policies for extramural grants appropriate to
the research being funded, and make fulfillment of these policies a condi-
tion of future funding.
* Cover the costs borne by researchers and institutions to make data and
code available.
* Practice transparency of data and code for intramural programs.
* Promote responsible sharing of data in areas such as clinical trials.
One of the main reasons research has changed with respect to ensuring integrity in the last few decades is the increasing role of data and computer software. New norms, standards, and training are required, and new opportunities for communication and reuse are available.
## Coding Culture
As with any high-stakes endeavor, it's important to think about how our treatment of each other contributes to success. Software development depends on accuracy, interdependence, and requires human judgment. Culture can therefore greatly impact productivity and resiliency.
### Cultural practices
1. Be open with your code, and your understanding. Coding productivity depends on information. Passing the information as quickly and openly as possible will help overcome this and facilitate progress. The communication phenomenon has been known since the [70s](https://en.wikipedia.org/wiki/The_Mythical_Man-Month) yet it is easy to forget.
2. Be charitable with your feedback. No error is so obvious that even the most experienced programmer won't make it from time to time. Break your PR reviews into demonstrable chunks that you can prove with a code snippet. Don't make sweeping or vague judgements.
3. Be thick-skinned. Code has a way of seeming absolute and damning when you get it wrong. On the other hand, that is its nature. Any error will feel that way, and everyone makes [mistakes](https://en.wikipedia.org/wiki/List_of_software_bugs).
## Complexity
> The art of programming is the art of organizing complexity.
--- Edsger W. Dijkstra
Programs tend to be difficult to understand. They are written by someone with roughly the same capacity for complexity as the reader, but with the advantage of having written it. This person will usually write to the limit of *their own* understanding because we write competitive programs that are as sophisticated and full-featured as possible.
There are strong incentives in all of computing to write complex code. There are also a limitless number of ways to write code that are functionally identical to each other.
It's also important to remember that complexity is a force of nature. Once enough possible states of a program have been achieved that are difficult to characterize – which is easy to do – it becomes impossible for any human brain to understand completely. This doesn't happen for all programs, but due to combinatorics, the point at which an application becomes complex can easily sneak up on the author.
## Why writing code is easier than reading it
> The process of understanding a code practically involves redoing it.
--- John von Neumann
Take a function in a large codebase. The person who wrote it understands:
1. The expected -- or possible -- range of inputs for this particular application. (Number of possible arguments -- values -- can easily be in the trillions for a simple function.) Possible range will often depend on the entire rest of the codebase and will usually be implicit.
2. How often the function is called at runtime (Note: This is different from the number of times it is referenced in the codebase.)
3. The intentions of the code. This can be different from what it *does* and is simpler to understand. (Usually intention vs reality is cleared up with comments.)
4. The narrative of the code. The history of a codebase is a powerful mnemonic device. "We wrote this because there was an issue in January. There are three other places this functionality is handled."
All of this asymmetry between the author and the reader is in addition to the raw size of the source code. In other words, these depend on the combinatorial explosion of interconnected components.
### Science and Software
Some coding principles are *less important* in science because:
* Scientific applications are more mathematical. You can reason about the range of values more easily.
* They can be very short.
* They are often meant to be run one way. For example, a Jupyter notebook that is intended to run in the order the cells are in on the page.
Some coding principles are *more important* in science because:
* Scientific applications are meant to be read. They are intended to teach and be verified.
* They are meant to be open-source, and reused.
* They are at the forefront of human understanding. Extra complexity is detrimental.
* They are sensitive to error. The stakes are high.
## Indirection, Abstraction, and Generalization
Indirection, abstraction, and generalization are three closely-related concepts in programming.
Indirection is the most general in that it refers to any symbolic representation of a process in the place of the process itself. A function calling another function, for instance.
Indirection can reduce complexity, and multiply the number of cases a piece of code can handle, *and be a source of complexity itself*.
The so-called [Fundamental Theorem of Software Engineering](https://en.wikipedia.org/wiki/Fundamental_theorem_of_software_engineering), attributed to David J. Wheeler, is: "We can solve any problem by introducing an extra level of indirection. (Except for the problem of too many levels of indirection.)"
Abstraction is also very general but refers to the process of removing details that are not relevant to some concept one is trying to model.
Finally, generalization is very similar to abstraction with the connotation of combining the functionality of several similar pieces of code into one, usually parameterized, copy.
Thinking about how accurately, simply, and powerfully your code represents what is being modeled is important because it makes your code more useful, understandable, and because it becomes more mathematical: You've distilled a model to its essence.
### Example: [The Weasel Algorithm](https://en.wikipedia.org/wiki/Weasel_program#Weasel_algorithm)
``` python
from random import choice, random
charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
target = list("METHINKS IT IS LIKE A WEASEL")
# create a random parent
parent = [choice(charset) for _ in range(28)]
while parent != target:
# calculate how fast to mutate depending on how close we are to the target.
rate = 1-((28 - (sum(t == h for t, h in zip(parent, target))
)) / 28 * 0.9)
# initialize ten copies of the parent, randomly mutated.
mut1 = [(ch if random() <= rate else choice(charset)) for ch in parent]
mut2 = [(ch if random() <= rate else choice(charset)) for ch in parent]
mut3 = [(ch if random() <= rate else choice(charset)) for ch in parent]
mut4 = [(ch if random() <= rate else choice(charset)) for ch in parent]
mut5 = [(ch if random() <= rate else choice(charset)) for ch in parent]
mut6 = [(ch if random() <= rate else choice(charset)) for ch in parent]
mut7 = [(ch if random() <= rate else choice(charset)) for ch in parent]
mut8 = [(ch if random() <= rate else choice(charset)) for ch in parent]
mut9 = [(ch if random() <= rate else choice(charset)) for ch in parent]
mut10 = [(ch if random() <= rate else choice(charset)) for ch in parent]
# put mutants in an array
copies = [mut1, mut2, mut3, mut4, mut5, mut6, mut7, mut8, mut9, parent]
# pick most fit parent from the beginning of the list
parent1 = max(copies[:4], key=lambda trial: sum(t == h for t, h in zip(trial, target)))
# pick most fit parent from the end of the list
parent2 = max(copies[4:], key=lambda trial: sum(t == h for t, h in zip(trial, target)))
# choose a place in the "genome" to split between the two parents.
place = choice(range(28))
mated = [parent1, parent2, parent1[:place] + parent2[place:], parent2[:place] + parent1[place:]]
# choose most fit amongst the parents and two progeny
parent = max(mated, key=lambda trial: sum(t == h for t, h in zip(trial, target)))
print(''.join(parent))
print('Success! \n', ''.join(parent))
```
This code has several problems.
1. There's a lot of duplication. Initializing the array requires as many lines as there are elements in the array. `sum(t == h for t, h in zip(trial, target))` is copied anywhere it is needed, etc.
2. It only works for specific lengths of `target` and `copies` because a user would need to modify code instead of changing a parameter. Literal values like 4 and 28 are used instead of variables (hardcoding).
3. It's not conceptual. If it weren't for the comments, it would be difficult to tell what the author is getting at. What does 28 mean in this context? Is it the same as other 28s?
4. It would be difficult to maintain (particularly if this style was used in a large program). Will the program still work if we change some of it? Is there an error in some of the duplicated code? How do we add functionality without adding more duplication?
All of these are problems that can be solved by generalization.
This code was adapted from a much more general version at [Rosetta Code](https://www.rosettacode.org/wiki/Evolutionary_algorithm#Python). In some ways the less general version is easier to understand. Your eye doesn't need to jump around as much to see what is going on. Usually, though, the more general code is preferred.
How a particular program should be written is a judgment call based on its size, predicted longevity, and what is most clear to readers. If you're starting to lose productivity or bugs are difficult to fix, it might be time to generalize. As with almost any engineering topic, generalization is a tradeoff and can be taken too far.
## Debugging
Debugging and experimentation are fundamentally the same. Debugging is done by isolating variables to identify the cause of a problem. It is comparing the results of two runs of a program with one variable changed. If one run reproduces a bug and the other doesn't, you can usually conclude the value of the variable is the cause. ("Variable" may not be a literal variable. It may be a short section of code, or input.)
Modular, functional code is easier to debug than the opposite: arbitrarily interconnected code. The reason for this is it's easier to understand, and it's easier to “change one thing” and understand the outcome. Code that is easy to debug is also generally [modular](#modules-and-modularity), [functional](#functional-programming), and [testable](#testability).
## Functional Programming
The term "functional" comes from Functional Programming, which is a discipline in which:
1. Functions always return the same output for an input. For instance:
``` python
def f(x):
return x + 2
```
Always returns the corresponding x + 2 for every x no matter the context and across time.
``` python
from random import randint
def f(x):
return x + randint(0, 10)
```
Does not.
2. Functions have no side-effects. The function can't modify any variables outside itself.
``` python
g = 0
def f(x):
global g
g = g + 1
return x + 2
```
Modifies `y`, which is outside the scope of the function `f`. State applies to external systems as well. Modifying a database, for instance, counts as out-of-scope, and can affect future runs of a function with the same input.
These properties guarantee that a program is [referentially transparent](https://en.wikipedia.org/wiki/Referential_transparency). You can easily modify it because you can replace any instance of a function with a value, and you can move functions without concern that their behavior will change. Additionally, variables that can be changed by many different functions, in the worst case global mutable variables, add complexity to programs. This is analogous to running an experiment where variables can't be controlled because the state of the program generally involves variables that could be at any state at any time and may radically change the behavior of the program. A function in functional programming (also known as a pure function) can be tested completely by changing its arguments.
The benefits of functional programming are closely related to [modularity](#modules-and-modularity). Functional programs are modular in that every function encapsulates some functionality and has a well-defined interface, the function signature.
## Unit Testing
Unit testing is a technique for verifying a codebase by writing sample input and expected output for a number of its functions. These tests -- usually structured as functions themselves -- are binary. They either run to completion, or they throw an exception. Failure to run represents failure of the test.
Unit tests are generally run without input. The input to _the function being tested_ is written as literals (`3`, `"blue"`, `[6, 7, 8]`) within the test function, or globally for several tests to use. Input to a test, or variables and environment necessary for the function to run is called a *fixture*.
A _test suite_ is the set of all tests of a codebase. Usually test suites are run in their entirety, giving a simple output with the percentage of tests that passed. The codebase (usually called a "build" in this context) is said to be *passing* or *failing*.
Unit testing serves two major purposes.
1. A Unit test is a declaration by the author that an input/output pair is "correct." They represent what the function *should* do. A programmer could write a unit test `test_add` of the function `add`:
``` python
def add(a, b):
"""One of the four basic arithmetic operations. Takes two numbers -- a and b -- and returns a + b."""
return a + b + 1
def test_add():
assert add(1, 1) == 3
```
The test verifies that the input/output pair (a = 1, b = 1)/3 is consistent with the function `add`, but does not validate that it is functioning properly (at least according to the way `add` has been described).
A valid test:
``` python
def test_add():
assert add(1, 1) == 2
```
would uncover the fact that `add` was either written incorrectly, or was recently broken.
When writing tests, it's good to check yourself by avoiding the output of the function as it is written. Take input/output pairs from another, reliable, source or think about the problem and write what you believe is the correct output. A programmer who runs `add` and then writes a test with (a = 1, b = 1)/3 would be perpetuating the error instead of correcting it.
2. Unit tests verify that changes to a codebase don't have unexpected effects. In other words, they help compare two versions of the code to show the functionality (set of input/output pairs) is the same.
This is helpful for refactoring or rewriting where many changes are being made and the application needs to be verified repeatedly.
### Limitations to Unit Tests
Unit tests are not proofs. They test one input/output pair that stands for many pairs in the input/output space. It is always possible that one pair is not tested that will be critical to the functioning of the program, and the program does not handle it as intended.
It is important, therefore, that unit tests are written for representative pairs. For instance:
``` python
def add(a, b):
if a == 3:
return a + b + 1
return a + b
def test_add_ones():
assert add(1, 1) == 2
```
The test `test_add_ones` is representative of the space of three integers, but it ignores the conditional that modifies the behavior in the case of a = 3. Real-world examples will be much less obvious so this is common. The path may be one of thousands, buried in many tens of thousands of lines of code.
Because all of the paths that characterize the behavior of the function are not tested, this codebase could be said to have insufficient "[coverage](https://en.wikipedia.org/wiki/Code_coverage)." Unit testing libraries will usually be able to measure coverage, which is useful for finding these cases.
### Testability
A function is easier to test when there's a simple way to characterize its behavior with some inputs and expected outputs. This generally means small, easy-to-write input/output pairs. For example:
``` python
def f(i, j, s):
if i > j:
return s + " is above the line"
else:
return s + " is below the line"
def test_low_f():
assert f(2, 3, "test") == "test is below the line"
def test_high_f():
assert f(3, -1, "test") == "test is above the line"
```
``` python
def g(i, j, s):
if i > j and is_thursday and urllib.request.urlopen("http://line.status.org") and random.randrange(1,10) > 2:
return "It's thursday and the line status is good and you're lucky."
```
Function f:
1. Is purely functional. It doesn't modify or depend on values outside its scope, and always returns the same values for the corresponding input.
`f` is easy to test. `test_low_f` and `test_high_f` cover both branches (the if and else) and characterize the behavior of `f` well. Note: It's a judgment call whether or not enough of the input space has been tested. It's easy to see in this case that all integers will behave predictably. This also doesn't cover exceptions, which should generally be tested.
Function g:
1. Works differently depending on the day, the status of an external web site, and a random number. The behavior of g depends on a lot of values, and values that are outside the scope of the function.
2. Does not handle all values of its parameters.
`g` is not purely functional and very difficult to test. The state that would be needed to get a predictable output is difficult to prepare. (Functions should also almost never silently fail.)