Skip to content

Releases: FedML-AI/FedML

FedML 0.8.9

28 Oct 04:45
f352cfb
Compare
Choose a tag to compare

FEDML Open Source: A Unified and Scalable Machine Learning Library for Running Training and Deployment Anywhere at Any Scale

Backed by FEDML Nexus AI: Next-Gen Cloud Services for LLMs & Generative AI (https://nexus.fedml.ai)

FedML Documentation: https://doc.fedml.ai

FedML Homepage: https://fedml.ai/
FedML Blog: https://blog.fedml.ai/
FedML Medium: https://medium.com/@FedML
FedML Research: https://fedml.ai/research-papers/

Join the Community:
Slack: https://join.slack.com/t/fedml/shared_invite/zt-havwx1ee-a1xfOUrATNfc9DFqU~r34w
Discord: https://discord.gg/9xkW8ae6RV

FEDML® stands for Foundational Ecosystem Design for Machine Learning. FEDML Nexus AI is the next-gen cloud service for LLMs & Generative AI. It helps developers to launch complex model training, deployment, and federated learning anywhere on decentralized GPUs, multi-clouds, edge servers, and smartphones, easily, economically, and securely.

Highly integrated with FEDML open source library, FEDML Nexus AI provides holistic support of three interconnected AI infrastructure layers: user-friendly MLOps, a well-managed scheduler, and high-performance ML libraries for running any AI jobs across GPU Clouds.

drawing

A typical workflow is showing in figure above. When developer wants to run a pre-built job in Studio or Job Store, FEDML®Launch swiftly pairs AI jobs with the most economical GPU resources, auto-provisions, and effortlessly runs the job, eliminating complex environment setup and management. When running the job, FEDML®Launch orchestrates the compute plane in different cluster topologies and configuration so that any complex AI jobs are enabled, regardless model training, deployment, or even federated learning. FEDML®Open Source is unified and scalable machine learning library for running these AI jobs anywhere at any scale.

In the MLOps layer of FEDML Nexus AI

  • FEDML® Studio embraces the power of Generative AI! Access popular open-source foundational models (e.g., LLMs), fine-tune them seamlessly with your specific data, and deploy them scalably and cost-effectively using the FEDML Launch on GPU marketplace.
  • FEDML® Job Store maintains a list of pre-built jobs for training, deployment, and federated learning. Developers are encouraged to run directly with customize datasets or models on cheaper GPUs.

In the scheduler layer of FEDML Nexus AI

  • FEDML® Launch swiftly pairs AI jobs with the most economical GPU resources, auto-provisions, and effortlessly runs the job, eliminating complex environment setup and management. It supports a range of compute-intensive jobs for generative AI and LLMs, such as large-scale training, serverless deployments, and vector DB searches. FEDML Launch also facilitates on-prem cluster management and deployment on private or hybrid clouds.

In the Compute layer of FEDML Nexus AI

  • FEDML® Deploy is a model serving platform for high scalability and low latency.
  • FEDML® Train focuses on distributed training of large and foundational models.
  • FEDML® Federate is a federated learning platform backed by the most popular federated learning open-source library and the world’s first FLOps (federated learning Ops), offering on-device training on smartphones and cross-cloud GPU servers.
  • FEDML® Open Source is unified and scalable machine learning library for running these AI jobs anywhere at any scale.

Contributing

FedML embraces and thrive through open-source. We welcome all kinds of contributions from the community. Kudos to all of our amazing contributors!
FedML has adopted Contributor Covenant.

FedML 0.8.7

23 Jul 09:55
2b6a202
Compare
Choose a tag to compare

What's Changed

New Features

  • [CoreEngine/MLOps] Supported LLM record logging.
  • [Serving] Made the inference backend for deepspeed work.
  • [CoreEngine/DevOps] Made the public cloud server scheduled into specific nodes.
  • [DevOps] Added the fedml light docker image and related documents.
  • [DevOps] Built and pushed light docker images and related pipelines.
  • [CoreEngine] Added timestamp when reporting system metrics.
  • [DevOps] Made the serving k8s cluster work with the latest images and updated related chart files.
  • [CoreEngine] Added the skip_log_model_net option for llm training.
  • [CoreEngine/CrossSilo] Supported customized hierarchical cross-silo.
  • [Serving] Created the default model config and readme file if the user did not provide any model config and readme options when creating a model card.
  • [Serving] Allow users to customize their token for end point and inference.

Bug Fixes

  • [CoreEngine] Made compatibility when opening subprocess on windows.
  • [CoreEngine] Fixed the issue that MPI Mode does not have client rank -1.
  • [CoreEngine] Set the python interpreter based on the current running python version.
  • [CoreEngine] Fixed the issue that failed to verify the pip ssl certificate when checking OTA versions.
  • [CrossDevice] Fixed issues where the test metrics are reported twice to MLOps and loss metrics are clipped to integers on the Beehive platform.
  • [App] Fixed issues when installing flamby on the heart-disease app.
  • [CoreEngine] Added handler when utf-8 cannot decode the output and error string.
  • [App] Fixed scripts and requirements on the FedNLP app.
  • [CoreEngine] Fixed issues whereFileExistsError triggered for all os.makedirs.
  • [Serving] Changed the model url to open.fedml.ai.
  • [Serving] Fixed the issue for OnnxExporterError and added Onnx as default dependent library when installing fedml.
  • [Serving] Fixed the issue where the local package name is different from MLOps UI.

Enhancements

  • [Serving] Establish container based on user's config and improve code readability.

FedML 0.8.4

20 Jun 17:47
23fbb84
Compare
Choose a tag to compare

What's Changed

New Features in 0.8.4

At FedML, our mission is to remove the friction and pain points of converting your ML & AI models from R&D into production-scale-distributed and federated training & serving via our no-code MLOps platform.
FedML is happy to announce our update 0.8.4. This release is filled with new capabilities, bug fixes, and enhancements. A key announcement is the launch of FedLLM for simplifying & reducing the costs associated with training & serving large language models. You can read more about it on our blog post.
Screenshot 2023-06-21 at 01 42 07

New Features

curl -XPOST http://localhost:40800/fedml/api/v2/disableAgent -d’{}'
curl -XPOST http://localhost:40800/fedml/api/v2/enableAgent -d’{}'
curl -XPOST http://localhost:40800/fedml/api/v2/queryAgentStatus -d’{}'

Bug Fixes

  • [CoreEngine] Create distinct device ids when running multiple Docker containers to simulate multiple clients or silos on one machine. Now using the product id plus a random id as the device id

  • [CoreEngine] Fixed a device assignment issue in get_torch_device in the distributed training mode.

  • [Serving] Fixed the exceptions that occurred when recovering at startup after upgrading.

  • [CoreEngine] Fixed the device id issue when running in the docker on MacOS.

  • [App] Fixed the issue in the app fedprox + sage graph regression and graph clf.

  • [App] Fixed an issue with the heart disease app failing when running in MLOps.

  • [App] Fixed an issue with the heart disease app’s performance curve

  • [App/Android] Enhanced Android starting/stopping mechanism and fixed the following issues:

Fixed status displays after stopping the run.
When stopping a Run during a round that has not finished, the MNN process will remain in IDLE state (it was previously going OFFLINE).
When stopping after a round is done, the training will now stop
Python server TAG in the logs is not correct. Now you can easily find the server mentioned in logs.

Enhancements

  • [Serving] Tested the inference backend and checked the response after the model deployment is finished.

  • [CoreEngine/Serving] Set the GPU option based on the availability of CUDA when running the inference backend, optimize the mqtt connection checking.

  • [CoreEngine] Stored model caches to the user home directory when running the federated learning.

  • [CoreEngine] Added the device id to the monitor message when processing inference request

  • [CoreEngine] Reported the runner exception and ignored exceptions when missing the bootstrap section in the fedml_config.yaml.

FedML 0.8.3

23 Apr 09:28
a8b59a6
Compare
Choose a tag to compare

What's Changed

New Features

  • [CoreEngine/MLOps] Introducing the FedML OTA (Over-the-Air) upgrade mechanism for the training platform and serving platform.
  • [Documents] Added guidance for the OTA mechanism in the user guide document.

Bug Fixes

  • [Serving] Fixed an issue where exceptions occurred when activating the model inference.
  • [CoreEngine] Fixed an issue where aggregator exceptions occurred when running MPI scripts.
  • [Documents] Fixed broken links in the user guide document.
  • [CoreEngine] Checked if the current job is empty in the get_current_job_status api.
  • [CoreEngine] Fixed a high CPU usage issue when the reload option was enabled in the client API.

Enhancements

  • [Serving] Improved data syncing between Redis server and Sqlite database.
  • [Serving] Implemented the use of triple elements (end point name/model name/model version) to identify each inference API request.
  • [DevOps] Updated Jenkinsfile to automate the building and deployment of the model serving Docker to the K8s cluster.
  • [Serving] Implemented the model monitor stop functionality when deactivating and deleting the model deployment.
  • [Serving] Checked the status of the end point when recovering on startup.
  • [CoreEngine] Refactored the OTA upgrade process for improved robustness.
  • [CoreEngine] Attach logs to the new Run ID when initiating a new run or deploying a model.
  • [CoreEngine] Refined upgrade status messages for enhanced clarity.

FedML 0.8.2

31 Mar 12:26
2afd0ec
Compare
Choose a tag to compare

What's Changed

New Features

  • [CoreEngine/MLOps] Refactor the entire serving platform to make it run more smoothly on the Kubernetes cluster.

Bug Fixes

  • [Training] The training status is still running after training.
  • [Training] Fixed the issue that the parrot platform can not collect and analyze metrics, events and logs.
  • [CoreEngine] Make the device unique in the docker container.

Enhancements

  • [CoreEngine/MLOps] Print log does not show on the MLOps distributed logging platform.
  • [CoreEngine/MLOps] Use the bootstrap script to upgrade the version of FedML when we don't need to publish the pip package.

FedML 0.8.0

23 Mar 12:37
Compare
Choose a tag to compare

FedML Open and Collaborative AI Platform

Train, deploy, monitor, and improve machine learning models anywhere (edge/cloud) powered by collaboration on combined data, models, and computing resources
image

What's Changed

Feature Overview

  1. supporting MLOps (https://open.fedml.ai)
  2. Multiple scenarios:
  • FedML Octopus: Cross-silo Federated Learning
  • FedML Beehive: Cross-device Federated Learning
  • FedML Parrot: FL Simulation with Single Process or Distributed Computing, smooth migration from research to production
  • FedML Spider: Federated Learning on Web Browsers
  1. Support Any Machine Learning Framework: PyTorch, TensorFlow, JAX with Haiku, and MXNet.
  2. Diverse communication backends (MPI, gRPC, PyTorch RPC, MQTT + S3)
  3. Differential Privacy (CDP-central DP; LDP-local DP)
  4. Attacker (API: fedml.core.FedMLAttacker); README: python/fedml/core/security/readme.md
  5. Defender (API: fedml.core.FedMLDefender); README: python/fedml/core/security/readme.md
  6. Secure Aggregation (multi-party computation): cross_silo/light_sec_agg_example
  7. In FedML/python/app folder, we provide applications in real-world settings.
  8. Enable federated model inference at MLOps (https://open.fedml.ai)

For more detailed instructions, please refer to https://doc.fedml.ai/

New Features

  • [Serving] Make all serving pipelines work: device login, model creation, model packaging, model pushing, model deployment and model monitoring.
  • [Serving] Make three entries for creating model cards work: from the trained model list, from the web page for creating model cards, from the related CLI for fedml model.
  • [OpenSource] Formally releases all of the previous versions as this v0.8.0 version: training, security, aggregator, communication backends, MQTT optimization, metrics tracing, events tracing, realtime logs.

Bug Fixes

  • [CoreEngine] CLI engine error when running simulation.
  • [Serving] Adjust the training codes to adapt the ONNX sequence rule.
  • [Serving] URL error in the model serving platform.

Enhancements

  • [CoreEngine/MLOps][log] Format the log time to NTP time.
  • [CoreEngin/MLOps] Shows the progress bar and the size of the transferred data in the log when the client downloads and uploads the model.
  • [CoreEngine] Client optimization when the network is weak or disconnected.

The old FedML library before FedML company is incorporated (as of 2022-04-29)

30 Apr 03:31
Compare
Choose a tag to compare
Revert "Revert "update events for different device id.""

This reverts commit 2a8706fadfa0337e086ddaf1356aaecd0edc3170.