Skip to content

Latest commit

 

History

History
6 lines (5 loc) · 285 Bytes

nonsense.md

File metadata and controls

6 lines (5 loc) · 285 Bytes

random assorted nonsense

  • apparently the model is relatively fine with replacing input tokens with <pad> even without short_ctx_dropout_p > 0
  • refuses to converge on Google's TPUs for some reason
    • not even with torch_xla._XLAC._xla_set_use_full_mat_mul_precision(True)