Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zoobot v2 release #114

Merged
merged 323 commits into from
Apr 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
323 commits
Select commit Hold shift + click to select a range
6c70398
revert to
mwalmsley Nov 6, 2023
34b80e3
ramdisk
mwalmsley Nov 6, 2023
7adc169
test on beluga
mwalmsley Nov 7, 2023
1d66218
debug sympy
mwalmsley Nov 7, 2023
0f605ed
longer run, more workers/cpus
mwalmsley Nov 7, 2023
6d3f528
oops slash
mwalmsley Nov 7, 2023
c6c1f39
fine, normal python
mwalmsley Nov 7, 2023
1e9ffc9
try 4 gpu v100, 9 workers, ramdisk
mwalmsley Nov 7, 2023
d015e51
still 4 gpu, now on SSD instead
mwalmsley Nov 7, 2023
df15eeb
only 3 workers per
mwalmsley Nov 7, 2023
53ce419
back to a100 big boy test
mwalmsley Nov 7, 2023
4dfbe91
2 a100
mwalmsley Nov 7, 2023
8bc5344
debug nodesplitter
mwalmsley Nov 7, 2023
ca96a86
debug generator
mwalmsley Nov 7, 2023
e0bd2db
slurm tinkering
mwalmsley Nov 7, 2023
c02c2eb
oops, 4 gpu
mwalmsley Nov 7, 2023
812b857
better logging
mwalmsley Nov 7, 2023
76b0819
more logging
mwalmsley Nov 7, 2023
60e0e1f
typo
mwalmsley Nov 7, 2023
9a821e4
repeat=1 for debugging still, add replace_sampler_ddp flag
mwalmsley Nov 7, 2023
c83714c
2.1 update
mwalmsley Nov 7, 2023
da24384
typo
mwalmsley Nov 7, 2023
45d836c
2 gpu scaling
mwalmsley Nov 7, 2023
ea0db1f
2 gpu
mwalmsley Nov 7, 2023
a49904c
logging
mwalmsley Nov 7, 2023
0f67ab6
1 gpu
mwalmsley Nov 7, 2023
2d8c3bb
scaling of a100 2x repeat5x
mwalmsley Nov 7, 2023
eb19ada
worked great, now 4x with the split
mwalmsley Nov 7, 2023
9bad295
add notes
mwalmsley Nov 9, 2023
044c69e
Merge branch 'narval-migration' of https://github.com/mwalmsley/zoobo…
mwalmsley Nov 9, 2023
a5f00ed
refactor, add mock wds func
mwalmsley Nov 18, 2023
d67fe54
mock wds training works locally, try on beluga
mwalmsley Nov 19, 2023
b66a1f3
force commit sh
mwalmsley Nov 19, 2023
7cd11a7
request less
mwalmsley Nov 19, 2023
84eb547
single gpu?
mwalmsley Nov 19, 2023
5ac6be9
use tmpdir
mwalmsley Nov 19, 2023
f2c89b2
it doesn't split! add my logging
mwalmsley Nov 19, 2023
a21463f
make once and exit
mwalmsley Nov 19, 2023
6181d40
1 task only
mwalmsley Nov 19, 2023
178e6fd
run on premade shards
mwalmsley Nov 19, 2023
2c8d07e
it worked - 8 batches? try 4 gpus
mwalmsley Nov 19, 2023
c5864a9
4 gpu works! how about 2 nodes, 4 gpu each?
mwalmsley Nov 19, 2023
5c77a4a
Multi-node works. Clean up.
mwalmsley Nov 19, 2023
9d9c991
prepare to update dependencies
mwalmsley Nov 20, 2023
42e18f2
make desi labelled webdatasets
mwalmsley Nov 20, 2023
dba85dc
typo
mwalmsley Nov 20, 2023
f93a47f
runs, 2xv100 next
mwalmsley Nov 20, 2023
abd1929
typo
mwalmsley Nov 20, 2023
11791ca
4 gpu
mwalmsley Nov 20, 2023
875b25c
typo
mwalmsley Nov 20, 2023
0212fcd
hangs during val metrics. Try disable wandb
mwalmsley Nov 21, 2023
a038550
csv logger
mwalmsley Nov 21, 2023
3147b93
wandb again but rank 0 only logging
mwalmsley Nov 21, 2023
dee1207
it's not wandb. Try disable all self.log
mwalmsley Nov 21, 2023
cb890a9
disable checkpointing callback
mwalmsley Nov 21, 2023
90888a8
limit batches, num_workers=1
mwalmsley Nov 21, 2023
bd1d019
works with limit_batches and num_workers=1.
mwalmsley Nov 21, 2023
742f1cc
broken with num_workers =1 and without train/val batch limit
mwalmsley Nov 21, 2023
15ed1ba
try with 2 gpu, 5 dataloader per, 10 chunks in train/val
mwalmsley Nov 21, 2023
b92935a
it runs with precisely split dataloaders and limit_val=1
mwalmsley Nov 21, 2023
a823db6
add callbacks back
mwalmsley Nov 21, 2023
01bbb43
restore sync_dist (since metrics were not the problem)
mwalmsley Nov 21, 2023
fead45a
training_step not compiling due to batch_idx arg changing
mwalmsley Nov 21, 2023
dbeced1
60 shards without compile
mwalmsley Nov 21, 2023
2769e36
run longer
mwalmsley Nov 21, 2023
4ebe7fa
23 hours
mwalmsley Nov 21, 2023
ccff0a1
try compile just the encoder
mwalmsley Nov 21, 2023
57dec26
compile encoder, works well locally
mwalmsley Nov 21, 2023
f1cf5c6
add notes
mwalmsley Nov 21, 2023
e15ed0e
single v100 with 10 cpu/workers
mwalmsley Nov 21, 2023
399d592
runs great on 1 gpu. Restart with save dir
mwalmsley Nov 21, 2023
6d17c6e
try train small-dim model, 2 gpu
mwalmsley Nov 22, 2023
032e52a
typo
mwalmsley Nov 22, 2023
e142683
pass aug args
mwalmsley Nov 22, 2023
487f466
typo
mwalmsley Nov 22, 2023
3303a9d
test priority
mwalmsley Nov 22, 2023
e9d3432
v2
mwalmsley Nov 22, 2023
3e27ae2
it works, queue for 23h full run
mwalmsley Nov 22, 2023
52718f7
try 512 features
mwalmsley Nov 22, 2023
c251988
restart 2gpu
mwalmsley Nov 22, 2023
9df8267
minimal transforms only speed test
mwalmsley Nov 22, 2023
b8d3fc0
it is def CPU limited. Add even more CPUs
mwalmsley Nov 22, 2023
1b169ab
set up for 300px training runs
mwalmsley Nov 23, 2023
e9c5604
Merge pull request #111 from mwalmsley/dependencies
mwalmsley Nov 23, 2023
7d81c92
typo
mwalmsley Nov 23, 2023
751b5d6
prediction adjustments
mwalmsley Nov 23, 2023
afae520
try maxvit on desi only
mwalmsley Nov 23, 2023
a393a92
tweaks
mwalmsley Nov 23, 2023
e60118b
maxvit runs, try 4 gpu in case small batches break
mwalmsley Nov 23, 2023
4d680d2
fix race condition
mwalmsley Nov 23, 2023
be5dda8
effnet v2 on 1 gpu
mwalmsley Nov 23, 2023
012547a
typo
mwalmsley Nov 23, 2023
94d1d5d
try pit xs
mwalmsley Nov 23, 2023
e1aebaf
pit s 64
mwalmsley Nov 23, 2023
8f188b9
notes on closing
mwalmsley Nov 23, 2023
b2dc816
pit s at b256 with 4 gpu
mwalmsley Nov 23, 2023
8f5f5f7
typo
mwalmsley Nov 23, 2023
8668f4c
xs 4 gpu why not
mwalmsley Nov 23, 2023
e381992
4gpu b0 why not
mwalmsley Nov 23, 2023
8a8bb01
maxvit_rmlp_small_rw_224
mwalmsley Nov 23, 2023
6776e72
vit small
mwalmsley Nov 23, 2023
08df999
vit tiny
mwalmsley Nov 23, 2023
ed6ef43
retry v2b0 4gpu
mwalmsley Nov 24, 2023
e22439d
typo
mwalmsley Nov 24, 2023
0018d88
normal effnet 4gpu redo
mwalmsley Nov 24, 2023
cb157b5
pit xs 4gpu redo
mwalmsley Nov 24, 2023
68eb38e
maxvit_rmlp_small_rw
mwalmsley Nov 24, 2023
3be5f2a
pit s
mwalmsley Nov 24, 2023
d633188
convnext tiny 1gpu
mwalmsley Nov 24, 2023
34a19f3
effnet b2
mwalmsley Nov 24, 2023
6eac15e
convnext small
mwalmsley Nov 24, 2023
fdeec72
typo
mwalmsley Nov 24, 2023
e763eb1
typo
mwalmsley Nov 24, 2023
07d88ce
try effnetb4
mwalmsley Nov 24, 2023
b979139
effnet b5
mwalmsley Nov 24, 2023
5854015
maxvit_rmlp_base_rw_224
mwalmsley Nov 24, 2023
fc89502
maxvit_rmlp_base_rw_224 on 16 gpus, just becase
mwalmsley Nov 24, 2023
5a77f5d
cache dir on multinode
mwalmsley Nov 24, 2023
93285be
5 worker version
mwalmsley Nov 24, 2023
076be8e
ran but hit nans, add gradient clip and w=0.05
mwalmsley Nov 24, 2023
1ec59bd
runs, restart w/ 4 gpu
mwalmsley Nov 24, 2023
1cdc977
wds support for ZoobotEncoder
mwalmsley Nov 24, 2023
754d003
autofill label cols
mwalmsley Nov 24, 2023
2835325
debug mismatched shuffle
mwalmsley Nov 24, 2023
d0b7ddd
rmlp base 4gpu
mwalmsley Nov 25, 2023
fda028a
subset frac
mwalmsley Nov 25, 2023
0c1d6e2
prepare for terrestrial
mwalmsley Nov 30, 2023
1e5fcc7
1 gpu
mwalmsley Nov 30, 2023
49db7f5
typo
mwalmsley Nov 30, 2023
a7be251
request less time
mwalmsley Nov 30, 2023
ff72870
try with terrestrial init, no norm
mwalmsley Nov 30, 2023
7873cf9
timm kwargs instead
mwalmsley Nov 30, 2023
fe9ad7b
silly typo
mwalmsley Nov 30, 2023
bc66792
4 gpu for speed, if it starts
mwalmsley Nov 30, 2023
027a40b
effnet terrestrial
mwalmsley Nov 30, 2023
d5fdb6f
dash missing
mwalmsley Nov 30, 2023
ac47a5a
little longer
mwalmsley Nov 30, 2023
ab89eb9
maxvit overfit faster last time at higher batch size
mwalmsley Nov 30, 2023
31b146f
try base from terrestrial init
mwalmsley Dec 1, 2023
bea4b42
refactor narval changes out of zoobot
mwalmsley Dec 8, 2023
10478f7
add per-q and per-campaign logging via schema metadata
mwalmsley Dec 19, 2023
cfe7694
overwrite
mwalmsley Dec 19, 2023
c3fe802
Merge branch 'narval-migration' of https://github.com/mwalmsley/zoobo…
mwalmsley Dec 19, 2023
f17e89f
temporarily force double
mwalmsley Dec 20, 2023
f16f773
force int labels
mwalmsley Dec 20, 2023
61615b9
log q/campaign losses with only rel. galaxies
mwalmsley Dec 20, 2023
8d28297
change floor
mwalmsley Dec 20, 2023
c95dc82
.int() for DirichletMultionial
mwalmsley Dec 20, 2023
43607f6
completely rework logging
mwalmsley Dec 20, 2023
eb8bd9a
typo
mwalmsley Dec 20, 2023
554826b
typo
mwalmsley Dec 20, 2023
e8aa6b6
continuing metric rework
mwalmsley Dec 20, 2023
d94150e
tweaks for foundation
mwalmsley Dec 25, 2023
d109603
small changes for new models
mwalmsley Jan 2, 2024
d99d193
small ssl tweaks
mwalmsley Jan 3, 2024
ee608cf
ukidss schema
mwalmsley Jan 4, 2024
7d1f379
Merge branch 'narval-migration' of https://github.com/mwalmsley/zoobo…
mwalmsley Jan 4, 2024
8d15167
ssl changes
mwalmsley Jan 4, 2024
862ee37
torchrgb
mwalmsley Jan 5, 2024
32c4f8b
needs merge
mwalmsley Jan 8, 2024
7db406c
Merge branch 'narval-migration' of https://github.com/mwalmsley/zoobo…
mwalmsley Jan 8, 2024
80517d5
allow nan strat
mwalmsley Jan 12, 2024
ce106f7
Merge branch 'narval-migration' of github.com:mwalmsley/zoobot into n…
mwalmsley Jan 12, 2024
23062a5
debugging
mwalmsley Jan 15, 2024
d3b4398
update labeldict
mwalmsley Jan 15, 2024
372bdb1
revert datamodule
mwalmsley Jan 15, 2024
e3124b7
add test dataloader
mwalmsley Jan 15, 2024
c0b2e41
add test metrics
mwalmsley Jan 15, 2024
d7386be
debug test metrics not appearing
mwalmsley Jan 15, 2024
9a00271
tweak logging
mwalmsley Jan 15, 2024
600ed3e
seems to block on multi-g
mwalmsley Jan 15, 2024
e0e56cd
maybe it's the datamodule
mwalmsley Jan 15, 2024
329a88c
num_workers=1, move test_trainer
mwalmsley Jan 15, 2024
de7562b
check if it is logging or wds
mwalmsley Jan 15, 2024
2b387c6
it ran. now logging back on
mwalmsley Jan 15, 2024
5c4c84b
oops, broke logging
mwalmsley Jan 16, 2024
53974da
fix prog bar
mwalmsley Jan 16, 2024
c8442db
try with aug fix
mwalmsley Jan 16, 2024
b652165
train effnetv2xl on evo
mwalmsley Jan 19, 2024
938d7ca
careful with nans
mwalmsley Jan 29, 2024
bb9322e
ft tweaks
mwalmsley Feb 1, 2024
0514b66
Merge branch 'narval-migration' of https://github.com/mwalmsley/zoobo…
mwalmsley Feb 1, 2024
122165d
prep for 2.0
mwalmsley Feb 5, 2024
efb39ef
Merge branch 'narval-migration' of github.com:mwalmsley/zoobot into n…
mwalmsley Feb 5, 2024
adcce7d
typo
mwalmsley Feb 5, 2024
3095b5a
maxvit fix
mwalmsley Feb 6, 2024
8ae3e5a
add sync batchnorm option
mwalmsley Feb 8, 2024
f354078
try sync again
mwalmsley Feb 9, 2024
164448c
fix encoder dim
mwalmsley Feb 11, 2024
f48f5ec
unit interval
mwalmsley Feb 26, 2024
797b748
typo
mwalmsley Feb 27, 2024
9eb0d89
fix typo
mwalmsley Feb 27, 2024
b2f8e39
add convnext support
mwalmsley Feb 27, 2024
772660d
docs
mwalmsley Feb 27, 2024
90c33f5
add cosine support
mwalmsley Mar 1, 2024
54f02bf
two fixes
mwalmsley Mar 1, 2024
741bda3
add cosine logging
mwalmsley Mar 2, 2024
9d8b791
simplify
mwalmsley Mar 2, 2024
09c70ba
pure pytorch?
mwalmsley Mar 2, 2024
e2c509e
cosine logging
mwalmsley Mar 2, 2024
3145542
partially reverse two fixes changes
mwalmsley Mar 2, 2024
f99dfd0
carefully start adding back
mwalmsley Mar 2, 2024
ce2c80a
try cosine uncommented but False
mwalmsley Mar 2, 2024
eaa98ce
add a warning
mwalmsley Mar 2, 2024
6b754b9
torch cosine
mwalmsley Mar 2, 2024
c14637e
remove mae
mwalmsley Mar 5, 2024
477c075
remove a little more
mwalmsley Mar 5, 2024
d185505
minor cleanup
mwalmsley Mar 8, 2024
15be580
Merge pull request #112 from mwalmsley/narval-migration
mwalmsley Mar 8, 2024
edc58bc
add from_scratch override
mwalmsley Mar 14, 2024
fa60f66
load 0-1 webdatasets, add note
mwalmsley Mar 15, 2024
cad9786
Merge branch 'narval-migration' of https://github.com/mwalmsley/zoobo…
mwalmsley Mar 15, 2024
0fe5cec
make models portable
mwalmsley Mar 19, 2024
e0fd96d
typo
mwalmsley Mar 19, 2024
bb6c403
try on colab
mwalmsley Mar 19, 2024
1c17b02
enforce 0-1 input
mwalmsley Mar 21, 2024
f3e3c14
Merge branch 'narval-migration' of https://github.com/mwalmsley/zoobo…
mwalmsley Mar 21, 2024
8543f5f
bump galaxy-datasets
mwalmsley Mar 21, 2024
48013c5
add webdataset
mwalmsley Mar 21, 2024
33b612d
require schema-like args
mwalmsley Mar 21, 2024
4b83e10
bump again
mwalmsley Mar 21, 2024
31d03ba
bump to python 3.9
mwalmsley Mar 21, 2024
a1ce097
Merge pull request #113 from mwalmsley/narval-migration
mwalmsley Mar 21, 2024
8e3498f
grab lightly cosine
mwalmsley Mar 29, 2024
573ad69
add docstrings throughout
mwalmsley Mar 30, 2024
a588d72
num_classes=0
mwalmsley Mar 30, 2024
a7bba6d
tinker to set cluster vars
mwalmsley Mar 30, 2024
0732853
require galaxy-datasets
mwalmsley Mar 30, 2024
0bffcb5
tiny tweak
mwalmsley Mar 31, 2024
d2e415a
add mae support
mwalmsley Mar 31, 2024
c8f8fa2
Merge branch 'dev' of github.com:mwalmsley/zoobot into dev
mwalmsley Mar 31, 2024
94ad1ee
small note
mwalmsley Mar 31, 2024
0f7b471
Merge branch 'dev' of github.com:mwalmsley/zoobot into dev
mwalmsley Mar 31, 2024
b1fbe2e
typo
mwalmsley Mar 31, 2024
2fd8775
change color->greyscale for consistency
mwalmsley Mar 31, 2024
5338b81
add cosine scheduler option
mwalmsley Mar 31, 2024
7eb780a
imports
mwalmsley Mar 31, 2024
6657c55
typo
mwalmsley Mar 31, 2024
cda966f
check tasks per node
mwalmsley Mar 31, 2024
1fa4c08
docs pass
mwalmsley Apr 3, 2024
c1744d0
Continue docs pass
mwalmsley Apr 3, 2024
e762128
docsing
mwalmsley Apr 3, 2024
08a66f3
readthedocs
mwalmsley Apr 3, 2024
579bcb4
m1
mwalmsley Apr 3, 2024
c9f6fa6
add sphinxemoji (vital)
mwalmsley Apr 3, 2024
30b79ca
typo
mwalmsley Apr 3, 2024
2f68c7e
rebuild
mwalmsley Apr 3, 2024
acfd970
final docs update?
mwalmsley Apr 4, 2024
dce3ec7
Merge branch 'main' into dev
mwalmsley Apr 4, 2024
e1b43c6
tiny tweaks
mwalmsley Apr 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/run_CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
strategy:
fail-fast: true
matrix:
python-version: ["3.8", "3.9"] # zoobot should support these (many academics not on 3.9)
python-version: ["3.9"] # zoobot should support these
experimental: [false]
include:
- python-version: "3.10" # test the next python version but allow it to fail
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -167,4 +167,5 @@ hparams.yaml

data/pretrained_models

*.tar
*.tar
*.ckpt
7 changes: 5 additions & 2 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
version: 2

build:
os: ubuntu-22.04
tools:
python: "3.9"

python:
version: 3.8
install:
- method: pip
path: .
extra_requirements:
- docs
- pytorch_m1
- tensorflow

sphinx:
fail_on_warning: true
14 changes: 0 additions & 14 deletions Dockerfile.tf

This file was deleted.

101 changes: 44 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,46 +17,45 @@ Zoobot is trained using millions of answers by Galaxy Zoo volunteers. This code
- [Install](#installation)
- [Quickstart](#quickstart)
- [Worked Examples](#worked-examples)
- [Pretrained Weights](https://zoobot.readthedocs.io/en/latest/data_notes.html)
- [Pretrained Weights](https://zoobot.readthedocs.io/en/latest/pretrained_models.html)
- [Datasets](https://www.github.com/mwalmsley/galaxy-datasets)
- [Documentation](https://zoobot.readthedocs.io/) (for understanding/reference)

## Installation

<a name="installation"></a>

You can retrain Zoobot in the cloud with a free GPU using this [Google Colab notebook](https://colab.research.google.com/drive/17bb_KbA2J6yrIm4p4Ue_lEBHMNC1I9Jd?usp=sharing). To install locally, keep reading.
You can retrain Zoobot in the cloud with a free GPU using this [Google Colab notebook](https://colab.research.google.com/drive/1A_-M3Sz5maQmyfW2A7rEu-g_Zi0RMGz5?usp=sharing). To install locally, keep reading.

Download the code using git:

git clone git@github.com:mwalmsley/zoobot.git

And then pick one of the three commands below to install Zoobot and either PyTorch (recommended) or TensorFlow:
And then pick one of the three commands below to install Zoobot and PyTorch:

# Zoobot with PyTorch and a GPU. Requires CUDA 11.3.
pip install -e "zoobot[pytorch_cu113]" --extra-index-url https://download.pytorch.org/whl/cu113
# Zoobot with PyTorch and a GPU. Requires CUDA 12.1 (or CUDA 11.8, if you use `_cu118` instead)
pip install -e "zoobot[pytorch-cu121]" --extra-index-url https://download.pytorch.org/whl/cu121

# OR Zoobot with PyTorch and no GPU
pip install -e "zoobot[pytorch_cpu]" --extra-index-url https://download.pytorch.org/whl/cpu
pip install -e "zoobot[pytorch-cpu]" --extra-index-url https://download.pytorch.org/whl/cpu

# OR Zoobot with PyTorch on Mac with M1 chip
pip install -e "zoobot[pytorch_m1]"

# OR Zoobot with TensorFlow. Works with and without a GPU, but if you have a GPU, you need CUDA 11.2.
pip install -e "zoobot[tensorflow]
pip install -e "zoobot[pytorch-m1]"

This installs the downloaded Zoobot code using pip [editable mode](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs) so you can easily change the code locally. Zoobot is also available directly from pip (`pip install zoobot[option]`). Only use this if you are sure you won't be making changes to Zoobot itself. For Google Colab, use `pip install zoobot[pytorch_colab]`

To use a GPU, you must *already* have CUDA installed and matching the versions above.
I share my install steps [here](#install_cuda). GPUs are optional - Zoobot will run retrain fine on CPU, just slower.

## Quickstart

<a name="quickstart"></a>

The [Colab notebook](https://colab.research.google.com/drive/17bb_KbA2J6yrIm4p4Ue_lEBHMNC1I9Jd?usp=sharing) is the quickest way to get started. Alternatively, the minimal example below illustrates how Zoobot works.
The [Colab notebook](https://colab.research.google.com/drive/1A_-M3Sz5maQmyfW2A7rEu-g_Zi0RMGz5?usp=sharing) is the quickest way to get started. Alternatively, the minimal example below illustrates how Zoobot works.

Let's say you want to find ringed galaxies and you have a small labelled dataset of 500 ringed or not-ringed galaxies. You can retrain Zoobot to find rings like so:

```python
```python

import pandas as pd
from galaxy_datasets.pytorch.galaxy_datamodule import GalaxyDataModule
Expand All @@ -77,11 +76,11 @@ Let's say you want to find ringed galaxies and you have a small labelled dataset
# retrain to find rings
trainer = finetune.get_trainer(save_dir)
trainer.fit(model, datamodule)
```
```

Then you can make predict if new galaxies have rings:

```python
```python
from zoobot.pytorch.predictions import predict_on_catalog

# csv with 'file_loc' column (path to image). Zoobot will predict the labels.
Expand All @@ -93,80 +92,66 @@ Then you can make predict if new galaxies have rings:
label_cols=['ring'], # only used for
save_loc='/your/path/finetuned_predictions.csv'
)
```
```

Zoobot includes many guides and working examples - see the [Getting Started](#getting-started) section below.

## Getting Started

<a name="getting_started"></a>

I suggest starting with the [Colab notebook](https://colab.research.google.com/drive/17bb_KbA2J6yrIm4p4Ue_lEBHMNC1I9Jd?usp=sharing) or the worked examples below, which you can copy and adapt.
I suggest starting with the [Colab notebook](https://colab.research.google.com/drive/1A_-M3Sz5maQmyfW2A7rEu-g_Zi0RMGz5?usp=sharing) or the worked examples below, which you can copy and adapt.

For context and explanation, see the [documentation](https://zoobot.readthedocs.io/).

For pretrained model weights, precalculated representations, catalogues, and so forth, see the [data notes](https://zoobot.readthedocs.io/en/latest/data_notes.html) in particular.
Pretrained models are listed [here](https://zoobot.readthedocs.io/en/latest/pretrained_models.html) and available on [HuggingFace](https://huggingface.co/collections/mwalmsley/zoobot-encoders-65fa14ae92911b173712b874)

### Worked Examples

<a name="worked_examples"></a>

PyTorch (recommended):

- [pytorch/examples/finetuning/finetune_binary_classification.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/pytorch/examples/finetuning/finetune_binary_classification.py)
- [pytorch/examples/finetuning/finetune_counts_full_tree.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/pytorch/examples/finetuning/finetune_counts_full_tree.py)
- [pytorch/examples/representations/get_representations.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/pytorch/examples/representations/get_representations.py)
- [pytorch/examples/train_model_on_catalog.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/pytorch/examples/train_model_on_catalog.py) (only necessary to train from scratch)

TensorFlow:
- [tensorflow/examples/train_model_on_catalog.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/tensorflow/examples/train_model_on_catalog.py) (only necessary to train from scratch)
- [tensorflow/examples/make_predictions.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/tensorflow/examples/make_predictions.py)
- [tensorflow/examples/finetune_minimal.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/tensorflow/examples/finetune_minimal.py)
- [tensorflow/examples/finetune_advanced.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/tensorflow/examples/finetune_advanced.py)

There is more explanation and an API reference on the [docs](https://zoobot.readthedocs.io/).

I also [include](https://github.com/mwalmsley/zoobot/blob/main/benchmarks) the scripts used to create and benchmark our pretrained models. Many pretrained models are available [already](https://zoobot.readthedocs.io/en/latest/data_notes.html), but if you need one trained on e.g. different input image sizes or with a specific architecture, I can probably make it for you.

When trained with a decision tree head (ZoobotTree, FinetuneableZoobotTree), Zoobot can learn from volunteer labels of varying confidence and predict posteriors for what the typical volunteer might say. Specifically, this Zoobot mode predicts the parameters for distributions, not simple class labels! For a demonstration of how to interpret these predictions, see the [gz_decals_data_release_analysis_demo.ipynb](https://github.com/mwalmsley/zoobot/blob/main/gz_decals_data_release_analysis_demo.ipynb).


### (Optional) Install PyTorch with CUDA

### (Optional) Install PyTorch or TensorFlow, with CUDA
<a name="install_cuda"></a>

*If you're not using a GPU, skip this step. Use the pytorch_cpu or tensorflow_cpu options in the section below.*
*If you're not using a GPU, skip this step. Use the pytorch-cpu option in the section below.*

Install PyTorch 1.12.1 or Tensorflow 2.10.0 and compatible CUDA drivers. I highly recommend using [conda](https://docs.conda.io/en/latest/miniconda.html) to do this. Conda will handle both creating a new virtual environment (`conda create`) and installing CUDA (`cudatoolkit`, `cudnn`)
Install PyTorch 2.1.0 or Tensorflow 2.10.0 and compatible CUDA drivers. I highly recommend using [conda](https://docs.conda.io/en/latest/miniconda.html) to do this. Conda will handle both creating a new virtual environment (`conda create`) and installing CUDA (`cudatoolkit`, `cudnn`)

CUDA 11.3 for PyTorch:
CUDA 12.1 for PyTorch 2.1.0:

conda create --name zoobot38_torch python==3.8
conda activate zoobot38_torch
conda install -c conda-forge cudatoolkit=11.3
conda create --name zoobot39_torch python==3.9
conda activate zoobot39_torch
conda install -c conda-forge cudatoolkit=12.1

CUDA 11.2 and CUDNN 8.1 for TensorFlow 2.10.0:
### Recent release features (v2.0.0)

conda create --name zoobot38_tf python==3.8
conda activate zoobot38_tf
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/ # add this environment variable

### Latest minor features (v1.0.4)

- Now supports multi-class finetuning. See `pytorch/examples/finetuning/finetune_multiclass_classification.py`
- Removed `simplejpeg` dependency due to M1 install issue.
- Pinned `timm` version to ensure MaX-ViT models load correctly. Models supporting the latest `timm` will follow.
- (internal until published) GZ Evo v2 now includes Cosmic Dawn (HSC). Significant performance improvement on HSC finetuning.

### Latest major features (v1.0.0)

v1.0.0 recognises that most of the complexity in this repo is training Zoobot from scratch, but most non-GZ users will probably simply want to load the pretrained Zoobot and finetune it on their data.

- Adds new finetuning interface (`finetune.run_finetuning()`), examples.
- Refocuses docs on finetuning rather than training from scratch.
- Rework installation process to separate CUDA from Zoobot (simpler, easier)
- Better wandb logging throughout, to monitor training
- Remove need to make TFRecords. Now TF directly uses images.
- Refactor out augmentations and datasets to `galaxy-datasets` repo. TF and Torch now use identical augmentations (via albumentations).
- Many small quality-of-life improvements
- New pretrained architectures: ConvNeXT, EfficientNetV2, MaxViT, and more. Each in several sizes.
- Reworked finetuning procedure. All these architectures are finetuneable through a common method.
- Reworked finetuning options. Batch norm finetuning removed. Cosine schedule option added.
- Reworked finetuning saving/loading. Auto-downloads encoder from HuggingFace.
- Now supports regression finetuning (as well as multi-class and binary). See `pytorch/examples/finetuning`
- Updated `timm` to 0.9.10, allowing latest model architectures. Previously downloaded checkpoints may not load correctly!
- (internal until published) GZ Evo v2 now includes Cosmic Dawn (HSC H2O). Significant performance improvement on HSC finetuning. Also now includes GZ UKIDSS (dragged from our archives).
- Updated `pytorch` to `2.1.0`
- Added support for webdatasets (only recommended for large-scale distributed training)
- Improved per-question logging when training from scratch
- Added option to compile encoder for max speed (not recommended for finetuning, only for pretraining).
- Deprecates TensorFlow. The CS research community focuses on PyTorch and new frameworks like JAX.

Contributions are very welcome and will be credited in any future work. Please get in touch! See [CONTRIBUTING.md](https://github.com/mwalmsley/zoobot/blob/main/benchmarks) for more.

Expand All @@ -176,6 +161,8 @@ The [benchmarks](https://github.com/mwalmsley/zoobot/blob/main/benchmarks) folde

Training Zoobot using the GZ DECaLS dataset option will create models very similar to those used for the GZ DECaLS catalogue and shared with the early versions of this repo. The GZ DESI Zoobot model is trained on additional data (GZD-1, GZD-2), as the GZ Evo Zoobot model (GZD-1/2/5, Hubble, Candels, GZ2).

**Pretraining is becoming increasingly complex and is now partially refactored out to a separate repository. We are gradually migrating this `zoobot` repository to focus on finetuning.**

### Citing

If you use this software, or otherwise wish to cite Zoobot as a software package, please use the [JOSS paper](https://doi.org/10.21105/joss.05312):
Expand All @@ -189,10 +176,10 @@ You might be interested in reading papers using Zoobot:
- [Practical Galaxy Morphology Tools from Deep Supervised Representation Learning](https://arxiv.org/abs/2110.12735) (2022)
- [Towards Foundation Models for Galaxy Morphology](https://arxiv.org/abs/2206.11927) (2022)
- [Harnessing the Hubble Space Telescope Archives: A Catalogue of 21,926 Interacting Galaxies](https://arxiv.org/abs/2303.00366) (2023)
- [Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies](https://arxiv.org/abs/2309.08660) (2023)
- [Galaxy Zoo DESI: Detailed morphology measurements for 8.7M galaxies in the DESI Legacy Imaging Surveys](https://academic.oup.com/mnras/advance-article/doi/10.1093/mnras/stad2919/7283169?login=false) (2023)
- [Galaxy mergers in Subaru HSC-SSP: A deep representation learning approach for identification, and the role of environment on merger incidence](https://doi.org/10.1051/0004-6361/202346743) (2023)

<!-- submitted papers: simulated merger classification, unsupervised anomaly detection, starforming clump localisation, and morphological segmentation. -->
- [Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies](https://arxiv.org/abs/2309.08660) (2023, submitted)
- [Transfer learning for galaxy feature detection: Finding Giant Star-forming Clumps in low redshift galaxies using Faster R-CNN](https://arxiv.org/abs/2312.03503) (2023)
- [Euclid preparation. Measuring detailed galaxy morphologies for Euclid with Machine Learning](https://arxiv.org/abs/2402.10187) (2024, submitted)

Many other works use Zoobot indirectly via the [Galaxy Zoo DECaLS](https://arxiv.org/abs/2102.08414) catalog (and now via the new [Galaxy Zoo DESI](https://academic.oup.com/mnras/advance-article/doi/10.1093/mnras/stad2919/7283169?login=false) catalog).
17 changes: 9 additions & 8 deletions benchmarks/pytorch/run_benchmarks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ SEED=$RANDOM


# GZ Evo i.e. all galaxies
# effnet, greyscale and color
# sbatch --job-name=evo_py_gr_eff_224_$SEED --export=ARCHITECTURE=efficientnet_b0,BATCH_SIZE=256,RESIZE_AFTER_CROP=224,DATASET=gz_evo,MIXED_PRECISION_STRING=--mixed-precision,GPUS=2,SEED=$SEED $TRAIN_JOB
# sbatch --job-name=evo_py_gr_eff_300_$SEED --export=ARCHITECTURE=efficientnet_b0,BATCH_SIZE=256,RESIZE_AFTER_CROP=300,DATASET=gz_evo,MIXED_PRECISION_STRING=--mixed-precision,GPUS=2,SEED=$SEED $TRAIN_JOB
# sbatch --job-name=evo_py_co_eff_224_$SEED --export=ARCHITECTURE=efficientnet_b0,BATCH_SIZE=256,RESIZE_AFTER_CROP=224,DATASET=gz_evo,COLOR_STRING=--color,GPUS=2,SEED=$SEED $TRAIN_JOB
# sbatch --job-name=evo_py_co_eff_300_$SEED --export=ARCHITECTURE=efficientnet_b0,BATCH_SIZE=128,RESIZE_AFTER_CROP=300,DATASET=gz_evo,COLOR_STRING=--color,GPUS=2,SEED=$SEED $TRAIN_JOB
# effnet, greyscale and color, 224 and 300px
sbatch --job-name=evo_py_gr_eff_224_$SEED --export=ARCHITECTURE=efficientnet_b0,BATCH_SIZE=256,RESIZE_AFTER_CROP=224,DATASET=gz_evo,MIXED_PRECISION_STRING=--mixed-precision,GPUS=2,SEED=$SEED $TRAIN_JOB
sbatch --job-name=evo_py_gr_eff_300_$SEED --export=ARCHITECTURE=efficientnet_b0,BATCH_SIZE=256,RESIZE_AFTER_CROP=300,DATASET=gz_evo,MIXED_PRECISION_STRING=--mixed-precision,GPUS=2,SEED=$SEED $TRAIN_JOB
sbatch --job-name=evo_py_co_eff_224_$SEED --export=ARCHITECTURE=efficientnet_b0,BATCH_SIZE=256,RESIZE_AFTER_CROP=224,DATASET=gz_evo,COLOR_STRING=--color,GPUS=2,SEED=$SEED $TRAIN_JOB
sbatch --job-name=evo_py_co_eff_300_$SEED --export=ARCHITECTURE=efficientnet_b0,BATCH_SIZE=128,RESIZE_AFTER_CROP=300,DATASET=gz_evo,COLOR_STRING=--color,GPUS=2,SEED=$SEED $TRAIN_JOB

# and resnet18
# sbatch --job-name=evo_py_gr_res18_224_$SEED --export=ARCHITECTURE=resnet18,BATCH_SIZE=256,RESIZE_AFTER_CROP=224,DATASET=gz_evo,MIXED_PRECISION_STRING=--mixed-precision,GPUS=2,SEED=$SEED $TRAIN_JOB
Expand All @@ -26,7 +26,7 @@ SEED=$RANDOM
# sbatch --job-name=evo_py_gr_res50_224_$SEED --export=ARCHITECTURE=resnet50,BATCH_SIZE=256,RESIZE_AFTER_CROP=224,DATASET=gz_evo,MIXED_PRECISION_STRING=--mixed-precision,GPUS=2,SEED=$SEED $TRAIN_JOB
# sbatch --job-name=evo_py_gr_res50_300_$SEED --export=ARCHITECTURE=resnet50,BATCH_SIZE=256,RESIZE_AFTER_CROP=300,DATASET=gz_evo,MIXED_PRECISION_STRING=--mixed-precision,GPUS=2,SEED=$SEED $TRAIN_JOB
# color 224 version
sbatch --job-name=evo_py_co_res50_224_$SEED --export=ARCHITECTURE=resnet50,BATCH_SIZE=256,RESIZE_AFTER_CROP=224,DATASET=gz_evo,COLOR_STRING=--color,MIXED_PRECISION_STRING=--mixed-precision,GPUS=2,SEED=$SEED $TRAIN_JOB
# sbatch --job-name=evo_py_co_res50_224_$SEED --export=ARCHITECTURE=resnet50,BATCH_SIZE=256,RESIZE_AFTER_CROP=224,DATASET=gz_evo,COLOR_STRING=--color,MIXED_PRECISION_STRING=--mixed-precision,GPUS=2,SEED=$SEED $TRAIN_JOB

# and with max-vit tiny because hey transformers are cool
# smaller batch size due to memory
Expand All @@ -35,11 +35,12 @@ sbatch --job-name=evo_py_co_res50_224_$SEED --export=ARCHITECTURE=resnet50,BATCH

# and max-vit small (works badly)
# sbatch --job-name=evo_py_gr_vitsmall_224_$SEED --export=ARCHITECTURE=maxvit_small_224,BATCH_SIZE=64,RESIZE_AFTER_CROP=224,DATASET=gz_evo,MIXED_PRECISION_STRING=--mixed-precision,GPUS=2,SEED=$SEED $TRAIN_JOB
# and convnext (works badly)
# and convnext (works badly, would really like to try again but bigger)
# sbatch --job-name=evo_py_gr_$SEED --export=ARCHITECTURE=convnext_nano,BATCH_SIZE=256,RESIZE_AFTER_CROP=224,DATASET=gz_evo,MIXED_PRECISION_STRING=--mixed-precision,GPUS=2,SEED=$SEED $TRAIN_JOB
# and vit
# sbatch --job-name=evo_py_gr_vittinyp16_224_$SEED --export=ARCHITECTURE=vit_tiny_patch16_224,BATCH_SIZE=128,RESIZE_AFTER_CROP=224,DATASET=gz_evo,MIXED_PRECISION_STRING=--mixed-precision,GPUS=2,SEED=$SEED $TRAIN_JOB

# and swinv2
# TODO

# and in color with no mixed precision, for specific project
# sbatch --job-name=evo_py_co_res50_224_fullprec_$SEED --export=ARCHITECTURE=resnet50,BATCH_SIZE=256,RESIZE_AFTER_CROP=224,DATASET=gz_evo,COLOR_STRING=--color,GPUS=2,SEED=$SEED $TRAIN_JOB
Expand Down
11 changes: 0 additions & 11 deletions docker-compose-tf.yml

This file was deleted.

15 changes: 0 additions & 15 deletions docs/autodoc/api.rst

This file was deleted.