08 May 20:30

jlewitt1

ecf874f

v0.0.27 Latest

Latest

Highlights

Custom cluster default env support and lots of new examples!

Cluster Default Env

Runhouse cluster now supports a default_env argument to provide more flexibility and isolation for your Runhouse needs. When you set up a cluster with the default env, Runhouse first installs the env on the cluster (any package installations and setup commands), then starts the Runhouse server inside that env, whether it be a bare metal or even conda env. Future Runhouse calls on/to the cluster, such as cluster.run(cmd), rh.function(local_fn).to(cluster), and so one, will default to run on this default env. Simply pass in any runhouse Env object, including it's package requirements, setup commands, working dir, etc, to the cluster factory.

my_default_env = rh.env(
    name="my_default_env",
    reqs=["pytest", "numpy"],
    working_dir="./",
)
my_conda_env = rh.conda_env(name="conda_env1", env_vars={...})  # conda env

cluster = rh.cluster(
    name="my_cluster",
    instance_type="CPU:2+",
    default_env=my_default_env,   # or my_conda_env
)

cluster.run(["pip freeze | grep pytest"])  # runs on default_env

Improvements

Introduce support for custom cluster default env (#678, #746, #760)
Start our own Ray cluster instead of using SkyPilot's (#742)
Exception handling for Module (#747)
Disable timeout in AsyncClient (#773)
Only sync rh config to ondemand cluster (#782)

Bug Fixes

Set CPUs for ClusterServlet to 0 (#772)
- previously, cluster servlet was taking up 1 cpu resource on the cluster. set this to zero instead
Set den_auth default to None in cluster factory (#784)
- non-None default argument causes the cluster to reconstruct from scratch (rather than reloaded from rns) if there's a non-None argument mismatch

Docs & Examples

New Examples

Llama3 (#741, #743, #744)
Parallel embedding (#759, #779, #783, #792)
Hyperparameter optimization (#770)
Llama2 fine-tuning with LoRA (#771)

New Tutorials

Async tutorial in docs (#768)

Assets 2

26 Apr 16:25

jlewitt1

v0.0.26

4cbb5c3

v0.0.26

Fast-follow Ray bugfix

Bugfixes

Start our own Ray cluster instead of using SkyPilot's (#742)

Assets 2

16 Apr 19:36

dongreenberg

v0.0.25

3c84253

v0.0.25

Improved parallelism, clearer exceptions, and saving resources within Den orgs

Improvements

Improve the thread, reference, and fault tolerance model for EnvServlet ray actors (#735, #733, #736, #734, #737)
Catch all non-deserializable exceptions client-side (#730)
Support for saving resources on behalf of an org (#676, #732)

Bugfixes

Dynamically set API_SERVER_URL (#708)
Move OMP_NUM_THREADS setting into servlet to avoid setting it on import by (#731)

Full Changelog: v0.0.24...v0.0.25

Assets 2

12 Apr 15:10

dongreenberg

v0.0.24

6b7cfea

v0.0.24

Fast-follow bugfixes for CPU parallelism and log streaming

Bug fixes

Fix ray persistently setting OMP_NUM_THREADS=1 (#723)
Fix method call log streaming by unbuffering stdout/err in call threadpool (#724)

Full Changelog: v0.0.23...v0.0.24

Assets 2

11 Apr 21:51

dongreenberg

v0.0.23

94b2772

v0.0.23

Richer async support, performance improvements, and bugfixes

Improvements

Client-side Async support (#690, #696, #696, #689) - We've improved the way we handle async calls to remove modules. Now, you can properly unblock the event loop and await any remote call by passing run_async as an argument into the method call. If your method is already defined as async, this will be applied automatically without specifying run_async so your code can await the remote method just as it did the original. You can still explicitly set run_async=False in that case to make the local call sync.
Improve Mapper ergonomics and docs (#700, #709) - Now you can simply pass a function to the mapper and it will send over the module and create replicas on its own. We'll publish new mapper tutorials shortly.
Cache rich signature for Module to improve method call performance (#699)
Don't serialize tracebacks in OutputType.EXCEPTION (#721) - Sometimes exceptions can't be deserialized locally because they depend on remote libraries. In those cases, we now still print the traceback for better visibility.
Unset OMP_NUM_THREADS when Ray automatically set it because it may break user parallelism expectations (#719)

Bugfixes

Fix stdout and logs streaming in various scenarios (#716, #717)
Remove unused requests.Session created in HTTPClient (#694)
Change Caddy installation to download from Github (#702) (Sorry Caddy!)
Inherit Cluster READ access for resources on the cluster (#706)
Set the cluster name in the HTTPClient upon rename (#704)
Fix some runhouse login bugs (#717)
Make errors from Den include status code and be more verbose (#707)
Fix SkySSHRunner tunnels and processes to be correctly cleaned up (#718)

Full Changelog: v0.0.22...v0.0.23

Assets 2

02 Apr 19:58

carolineechen

v0.0.22

a30e577

v0.0.22

Performance improvements + bug fixes

Improvements

Add to open_ports when creating new on demand cluster (#651)
Updates to Sagemaker Cluster (#654)
Change AuthCache logic to request per keypair (#684)

Performance Improvements

Cache various module/function computations (#661, #665, #662)
Async daemon side components (#656, #664, #673, #674, #670)
Use ThreadPoolExecutor to synchronous function calls on server side (#663)
Decrease log wait time (#685)

Bug Fixes

Fix bug with json serialization for exceptions (#655)
- Update returned exceptions to be json serializable.
Use shell for running cmd in env servlet (#667)
- Previously shell commands would not consistently work.
Fix cluster autostop (#672, #681, #683)
- Change to correctly set and update last activity time and do it in a background thread
Fix multinode cluster ips (#681)
- Cluster ips previously computed from cached ips and would incorporate stale ones. Update to use only current ips.

Examples

Add Llama2 on Inferentia with TGI example (#649)
Update Inferentia examples to use the DL AMI (#677)

Assets 2

21 Mar 21:33

carolineechen

v0.0.21

0cf1e88

v0.0.21

Some performance and feature improvements, bug fixes, and new examples.

Improvements

OpenAPI pages for cluster (#579, #586, #587, #589, #590)
Properly raise exceptions in Module's load_config when dependency is missing (#595)
Kill Ray actors by default during runhouse stop (#596)
module.to(rh.here) throws error if local server is not initialized (#597)
Send exceptions in data field (#602)
Run commands inside env servlet (#603)
Return exceptions instead of None in failed mapper replicas (#605)
Remove sshtunnel library dependency (#625, #634, #640)
Don't save cluster secret during cluster init (#633)
Remove creds from cluster's config file (#637)

Performance

Use check_server instead of is_up with refresh for ondemand cluster endpoint (#614)
Remove register_activity calls within env servlet (#629)

Bug Fixes

Install aws dependencies properly for runhouse[aws] (#613)
Fix env servlet name in put_resource (#626)
- Env servlet was using conda env name instead of env resource name.
Fix SkySSHRunner local and remote port ordering (#630)

BC-Breaking

Remove previously deprecated items (#624)
- reqs and setup_cmds in rh.function.to() removed. Pass it into the env instead.
- access_type removed in Resource and share. Use access_level instead.
- global pinning methods removed. Use rh.here.put/get/delete/keys/clear instead.
Deprecate and raise exception for passing system into function/module factories (#625)
- Passing in system to rh.function/module does not send code to the system and can be misleading. Use .to or get_or_to to sync code to the cluster.

Examples

See rendered examples on https://www.run.house/examples

New Examples

Mistral 7B Inference with TGI on AWS EC2 (#585, #604)
Mistral 7B Inference on AWS Inferentia (#609)
Langchain RAG App on AWS EC2, with Custom Domain (#607, #621)
Llama2 on EC2 A10G (#608)
Llama2 Inference with TGI on AWS EC2 A10G (#610)

Updates

Add READMEs to GitHub (#612, #619)
Avoid reinstall for envs and extra imports in examples (#616, #618)

Assets 2

07 Mar 21:40

carolineechen

v0.0.20

f7b978c

v0.0.20

Highlights

Cluster Sharing

We’ve made it easier to share clusters across different environments and with other users. You can now share and load a cluster just as you would any other resource.

my_cluster = rh.cluster("rh-cluster", ips=[...], ...)
my_cluster.share(["user1@email.com", "username2"])

# load the box with
shared_cluster = rh.cluster("owner_username/rh-cluster")

Shared users will be able to seamlessly run shared apps on that cluster, or SSH directly onto the remote box. To enable this, we persist the SSH credentials for the cluster as a Runhouse Secret object, which can easily be reloaded when another user tries to connect.

Improved rh.Mapper

rh.Mapper was first introduced in runhouse v0.0.15, an extension of functions/modules to handle mapping, replicating, and load balancing. Further improvements and some bug fixes were included in this release, plus a BC-breaking variable name (see section below).

def local_sum(arg1, arg2, arg3):
    return arg1 + arg2 + arg3

remote_fn = rh.function(local_sum).to(my_cluster)
mapper = rh.mapper(remote_fn, replicas=2)
mapper.map([1, 2], [1, 4], [2, 3])
# output: [4, 9]

Improvements

Use hashed subtoken for cluster requests (#270)
Simplify storage of SSH creds for more reliable cluster access across environments and users (#479)
Remove sky storage dependency (#415)
Replace subprocess check_call with run (#503)
Serialize exceptions properly (#516)
Improved Logging
- Only write out execution logs if stream_logs is set (#490)
- Propagate logs from pip installs on cluster (#505)
- Write some logs to sys.out (#519)

Bug Fixes

Mapper bug fixes (#539)

Deprecation

Renaming config_for_rns property to config function (#553, #554, #555)

BC-Breaking

rh.mapper factory function args renaming
- num_replicas -> replicas
- replicas -> concurrency

Docs

See updated tutorials on Runhouse docs

New quick start guides -- local, cloud, and Den versions
Updated API tutorials -- clusters, functions & modules, envs, folders

Examples

See new Runhouse examples on GitHub or webpage

Llama2 inference on AWS EC2
Stable Diffusion XL 1.0 on AWS EC2
Stable Diffusion XL 1.0 on AWS Inferentia

Other

Remove paramiko as server connection type

Assets 2

15 Feb 22:04

rohinb2

v0.0.19

9ad897c

v0.0.19

Minor bug fix release
Bug fix fixing import breaking in Python 3.8
Bug fix for loading public functions by name

Assets 4

15 Feb 15:02

carolineechen

v0.0.18

169cff2

v0.0.18

Highlights

Runhouse Local Mode and rh.here

Previously, the Runhouse server was strictly designed to allow you to deploy apps to it remotely with my_module.to(my_cluster). Now, you can now start the Runhouse server daemon directly to be able to deploy it locally like a traditional web server. Access the local daemon's Cluster object in Python with rh.here. rh.here always refers to the locally running daemon, so you can use within an existing Runhouse cluster as well.

Start your local Runhouse server:

$ runhouse restart
$ runhouse status

To send a module:

def concat(a, b):
    return a+b

import runhouse as rh
rh.function(concat).to(rh.here)

To try out your service:

curl -X "GET" 'http://localhost:32300/concat/call?a=run&b=house'

>>> {"data":"\"runhouse\"","error":null,"traceback":null,"output_type":"result_serialized","serialization":"json"}

This is also particularly useful for debugging. You can ssh onto your cluster, start a Python shell, and run methods like rh.here.call("my_module", "my_method") to test or analyze your deployed module's behavior or contents quickly.

Replace nginx with Caddy

Use Caddy as a reverse proxy for the Runhouse server launched on clusters, as well as automatically generating and auto-renewing self-signed certificates, making it easy to secure your cluster with HTTPS right out of the box.

Improvements

Improved logging to reduce log clutter, and differentiate local and cluster(#436, #475)
Support packages using setup.cfg (#456)
Runhouse status updates (#462, #469)

Build

Remove Sky dependency for SSH command runner

Bug Fixes

Fix name to properly be updated in cluster when saved (#451, #477)
Fix bug in sagemaker cluster factory (#459)
Fix Cluster.from_name to properly load existing config in Den (#468)
Fix CLI runhouse status for on-demand cluster (#478)

BC-Breaking

reqs and setup_cmds removed from function .to (#373)
Generator module now returns generator rather than streamed results (#373)

Other

Refactor and new methods for obj store (#373)
Replace nginx with Caddy (#406)
Set unique SSH control path

Assets 4

Releases: run-house/runhouse

v0.0.27

Highlights

Cluster Default Env

Improvements

Bug Fixes

Docs & Examples

New Examples

New Tutorials

v0.0.26

Bugfixes

v0.0.25

Improvements

Bugfixes

v0.0.24

Bug fixes

v0.0.23

Improvements

Bugfixes

v0.0.22

Improvements

Performance Improvements

Bug Fixes

Examples

v0.0.21

Improvements

Performance

Bug Fixes

BC-Breaking

Examples

New Examples

Updates

v0.0.20

Highlights

Cluster Sharing

Improved rh.Mapper

Improvements

Bug Fixes

Deprecation

BC-Breaking

Docs

Examples

Other

v0.0.19

v0.0.18

Highlights

Runhouse Local Mode and rh.here

Replace nginx with Caddy

Improvements

Build

Bug Fixes

BC-Breaking

Other