Skip to content

Releases: Oneflow-Inc/oneflow

Version 1.0.0

11 Mar 03:18
2491c5b
Compare
Choose a tag to compare

Version 1.0.0

OneFlow v1.0.0 release note

OneFlow v1.0.0 came out, welcome to install the new version for a better experience.

  • Highlights
  • New Features
  • Improvements
  • Changes and Fixes
  • Performance

Highlights

This version update includes 447 commits and the following highlights:

  • Released a new interface compile_from_torch. This interface, while sharing the parameter memory, converts a PyTorch Module instance into a OneFlow Module instance. It supports direct Eager execution or conversion into a static graph nn.Graph, further accelerating the process using MLIR compilation. This interface is rapidly evolving and currently supports dynamic shape compilation, validated across typical models such as ResNet50, Faster RCNN, and Stable Diffusion.

  • Made a series of optimizations and refactoring to Eager execution runtime, including unification of system memory pools, integration with CUDA native interfaces, optimization of instruction scheduling mechanisms, introduction of an instruction fusion mechanism, optimization of Autograd graph construction speed, optimization of Op inference process, and decoupling of Instruction and Stream, etc.

  • The static graph distributed physical execution plan supports separate compilation functionality, allowing each process to independently compile its required execution plan, eliminating linear growth of compilation time with GPU scale.

  • Addition of a series of functional automatic differentiation related interface supports, including jvp, vjp, hvp, vhp, jacobian, and hessian.

  • Addition of the Insight module, supporting visualization of kernel invocation, execution time, speed, and other related information within the embedded point intervals.

  • Updates to LiBai (the open-source toolbox for large-scale model training), with native support for fine-tuning and distributed inference of Llama2 and ChatGLM2, supporting full finetune, adapter finetune, lora finetune. lm-eval-harness can be used for language model evaluation and validation.

  • Upgrade of OneFlow Serving functionality, adding support for OneFlow Python backend and OneFlow Lite backend, in addition to the existing support for OneFlow Cpp backend.

New Features

1. compile_from_torch

The compile_from_torch interface, while sharing the parameter memory, converts a PyTorch Module instance into a OneFlow Module instance. It supports direct Eager execution or conversion into a static graph nn.Graph, further accelerating the process using MLIR compilation. (#10404, #10408, #9984, #9754)

Interface Signature and Parameter Introduction:

compile_from_torch(torch_module: torch.nn.Module, \*, use_graph=True, options={})
* torch_module: The Torch Module instance to be converted.
* use_graph: Indicates whether to transform into a static graph nn.Graph and utilize MLIR compilation acceleration. The default is True.
* options:
  * size: When using static graph nn.Graph, the hash value of the graph corresponding to the input shape will be calculated and cached. Size indicates the maximum capacity of the static graph cache. When exceeding the maximum capacity, the graph will be cleared based on the LRU strategy. The default value is 9.
  * dynamic: For the first input with a dynamic shape, the graph will be fully compiled. For subsequent inputs with different shapes, if dynamic is True, shared graph will be used for compilation acceleration; if dynamic is False, the compilation will be performed each time. The default is True.
  * debug: Debug mode and log level settings. -1 disables debug mode, 0 outputs warnings and static graph construction information, 1 additionally outputs graph construction information for each sub-module, 2 additionally outputs progress for each operator, 3 provides more detailed operator information. The default value is -1.

Example of Usage:

import torch
from torchvision import models
import oneflow
from oneflow.framework.infer_compiler import compile_from_torch
DEVICE = torch.device("cuda")
WEIGHT = models.ResNet50_Weights.DEFAULT
model = models.resnet50(weights=WEIGHT).to(DEVICE)
compile_model = compile_from_torch(model, options={"dynamic": True})

2. Separated Compilation

The static graph distributed physical execution plan supports separate compilation , allowing each process to independently compile its required execution plan, thereby preventing linear growth of compilation time with GPU scale. The separate compilation feature supports 3D hybrid parallel (data parallelism + model parallelism + pipeline parallelism) scenarios and can be used together with LiBai (the open-source large-scale model training toolbox). To enable this feature, use the command: export ONEFLOW_ENABLE_LAZY_SEPARATE_COMPILE=1. (#9920, #10140, #10141, #10124, #10102)

Below are the test results on a 128-card A100-PCIE-40GB device with LiBai on the GPT2 model:

Parallelism Separated Compilation Enabled Execution Plan Compilation Time
Data Parallelism (DP128 MP1 PP1) No Over 20 minutes
Data Parallelism (DP128 MP1 PP1) Yes 108.21 s
3D Parallelism (DP4 MP4 PP8) No 445.16 s
3D Parallelism (DP4 MP4 PP8) Yes 82.88 s

3. Functional Automatic Differentiation Interfaces

A series of functional automatic differentiation-related interfaces have been introduced, including jvp, vjp, hvp, vhp, jacobian, and hessian. (#10412, #10428)

Example of Usage:

import oneflow as flow

# jacobian example
def exp_reducer(x):
    return x.exp().sum(dim=1)

input = flow.rand(2, 2)
jac_rslt = flow.autograd.functional.jacobian(exp_reducer, input)

# vhp example
def pow_reducer(x):
    return x.pow(3).sum()

input = flow.rand(2, 2)
v = flow.ones(2, 2)
vhp_rslt = flow.autograd.functional.vhp(pow_reducer, input, v)

4. Insight Module

Introduced a new Insight module, enabling the visualization of kernel invocation, execution time, speed, and other related information within the embedded point intervals. (#10370)

Usage:

  • Step 1: Set embedded point intervals in the code using the OneFlow Profiler module.
  • Step 2: Run the code and use NVIDIA Nsight Systems to generate a .sqlite file.
  • Step 3: Use the OneFlow Insight module to generate a .json file.
  • Step 4: Open the .json file in the browser at chrome://tracing/ or edge://tracing/ to obtain the visualization interface.

For more detailed information, please refer to: https://github.com/Oneflow-Inc/oneflow/tree/master/python/oneflow/utils/insight#usage

5. LiBai Version Update

  • LiBai (the open-source toolbox for large-scale model training) has been upgraded to version v0.3.0. It now natively supports finetuning and distributed inference of large language models Llama2 and ChatGLM2. It supports full full finetune, adapter finetune, lora finetune. lm-eval-harness can be used for language model evaluation and validation.

  • The distributed training and inference support for ChatGLM and Llama2 are as follows:

Models 2D (tp+pp) Inference 3D Parallel Training
ChatGLM
Llama2

Example of Usage:

# full finetune
bash tools/train.sh projects/Llama/train_net.py projects/Llama/configs/llama_sft.py 8
# adapter finetune
bash tools/train.sh projects/Llama/adapter/train_net.py projects/Llama/adapter/adapter_sft.py 8
# inference
bash tools/infer.sh projects/Llama/pipeline.py 8
# eval
python projects/Llama/utils/eval_adapter.py

6. Other New Features

  • Added FFT-related operators. (#10027)

  • Added zeta operator. (#10189)

  • Added tril_ operator. (#9996)

  • Added clone operator. (#9800)

  • Added frac and frac_ operator. (#9979)

  • Added exp2 operator. (#9958)

  • Added rrelu operator. (#9736)

  • Added lgamma backward operator. (#10177)

  • Added digamma operator. (#10066)

  • Added trigamma operator. (#10117)

  • Added bitwise_not operator. (#9859)

  • Added squared_relu operator. (#10316)

  • Added skip_rms_norm operator. (#10036)

  • Added multi_tensor_amp_grad_scaler related operators. (https://github.com/Oneflo...

Read more

Version 0.9.0

04 Jan 01:58
Compare
Choose a tag to compare

Version 0.9.0

OneFlow v0.9.0 release note

OneFlow v0.9.0 came out, welcome to install the new version for a better experience.

  • Highlights
  • Backwards Incompatible Change
  • New Features
  • Performance
  • Improvements
  • Bug fixes
  • Documentation
  • Edge Tools

Highlights

This update contains 640 commits and the following highlights:

  • With the addition of 86 new API interfaces and operators aligned with PyTorch and the fix of 104 bugs related to operator compatibility, OneFlow v0.9.0 provides better PyTorch API and model compatibility. In v0.9.0, users can migrate more PyTorch models to OneFlow with one click and gain faster performance.

    • Allowing one-click migration of Stable Diffusion、GLM、YOLOv5 etc to OneFlow.

    • More convenient model migration. Oneflow.load supports loading the torch.save models directly.

    • With the newly added oneflow.mock_torch module and mock method, oneflow can migrate complex PyTorch models containing multiple scripts with one click without changing the original PyTorch script.

  • Global Tensor has added a series of interfaces and methods that are convenient for distributed programming, and fixed known related bugs.

  • The Graph released a new feature of automatic parallelism (version 1), which supports automatic search for the fastest SBP with a specified Placement. When writing distributed models with Global Tensor, users do not need to consider parallelism.

  • The Graph adds a series of optimizations related to memory, execution speed, pipeline masking, and compilation speed to improve performance and reduces memory overhead.

  • The Graph provides a series of functions to aid debugging, including analyzing memory logs, displaying the progress during the compilation stage, and the computation graph.

  • OneFlow IR provides more compilation optimization functions.

  • The error prompt of OneFlow is more user-friendly, which supports highlighting the error content and simplifies unnecessary information details inside the system. In this connection, you can visually learn about the location and type of the error.

  • A series of operator optimizations and system optimizations have been added, including Eager instruction scheduling, high-performance CUDA kernel, opening up of multiple memory pools, etc.

Backwards Incompatible Change

  • To solve the possible duplicate name conflict between Graph.Block.config and module user-defined attribute module.config, OneFlow redesigned the abstraction of Graph proxy Module/Tensor, thus introducing a breaking change: (#9351 , https://github.com/Oneflow-Inc/oneflow/pull/9437,https://github.com/Oneflow-Inc/oneflow/pull/9607)

    • The attr and config attributes on Block are removed, and Block is renamed to Proxy;

    • Implementation plan: When added as members of nn.Graph, the original Eager Module and Tensor types will be packaged into the Proxy class, and the corresponding GraphModule and GraphTensor will be generated; nn.Graph will use Proxy in the subsequent composition For proxy execution, when the proxy is executed, the original eager type and graph type can be obtained from the Proxy. The naming refers to the naming of torch.fx.

    Eager primitive type Graph type, base class Graph Block Proxy execution type, the base class is called Proxy
    Function Supporting to get the original eager type A Graph code block corresponding to GraphBlock stores the information required for graph execution, such as name/scope/lazy op or tensor and optimization switches of some sub-modules on the graph. Proxy execution capability, using the same execution interface as Module and Tensor, but the behavior has changed, such as lazy, and the op that may be executed has also been rewritten.
    Module type Module GraphModule ProxyModule contains a Module member and a GraphModule member
    Tensor type Tensor GraphTensor ProxyTensor contains a Tensor member and a GraphTensor member
    • Here is an exmaple:
    import oneflow as flow
    import oneflow.nn as nn
    from oneflow.nn.graph import GraphModule
    linear = flow.nn.Linear(3, 8, False)
    class LinearGraph(nn.Graph):
        def __init__(self):
            super().__init__()
            # The type of linear is nn.Module. When added as an attribute of nn.Graph, it will be registered with nn.Graph.
            # self.linear has been wrapped as a ProxyModule.
            #self.linear.weight has been wrapped as a ProxyTensor.
            #nn.Graph will use ProxyModule to perform graph composition.
            self.linear = linear
            # There are two parts in ProxyModule, one is the original module and the other is GraphModule.
            self.linear.to(GraphModule)  # Get the corresponding GraphModule, on which you can do configuration related to graph optimization.
            # such as setting a pipeline stage for a module, and enabling pipeline parallelism. 
            self.linear.to(GraphModule).set_stage(id, placement)
            self.linear.to(nn.Module)  # get the corresponding original nn.Module.
            self.linear.weight.to(flow.Tensor)  # get the corresponding original Tensor.

Outdated interface in OneFlow v0.8.0:

import oneflow as flow
import oneflow.nn as nn
linear = flow.nn.Linear(3, 8, False)
class LinearGraph(nn.Graph):
    def __init__(self):
        super().__init__()
        self.linear = linear
        self.linear.config.set_stage(id, placement)  # set stage
        self.linear.config.activation_checkpointing = True  # set activation checkpointing
        self.linear.origin  # get the corresponding original nn.Module
        self.linear.weight.origin # get the corresponding original Tensor

New interface in OneFlow v0.9.0:

import oneflow as flow
import oneflow.nn as nn
from oneflow.nn.graph import GraphModule
linear = flow.nn.Linear(3, 8, False)
class LinearGraph(nn.Graph):
    def __init__(self):
        super().__init__()
        self.linear = linear
        self.linear.to(GraphModule).set_stage(id, placement)  # set stage
        self.linear.to(GraphModule).activation_checkpointing = True  # set activation checkpointing
        self.linear.to(nn.Module)  # get the corresponding original nn.Module
        self.linear.weight.to(flow.Tensor)  # get the corresponding original Tensor

New Features

Graph

  • Adds automatic parallelization feature for the first stage in Graph: (#8891, #9172 , #9288)

    • Automatic parallelism can be enabled by configuring self.config.enable_auto_parallel(True) in Graph. After it is enabled, you don't have to configure sbp, and the Graph will automatically find the optimal sbp combination.

    • Here is an exmaple:

    import oneflow as flow
    class SubclassGraph(flow.nn.Graph):
        def __init__(self):
            super().__init__() # MUST be called
            # auto parallelism configuration
            self.config.enable_auto_parallel(True)
            # other configurations about auto parallelism
            # ......
    
        def build(self):
            pass
  • Graph supports straightened algorithm optimization with memory priority, reducing the memory life cycle of each Tensor by adjusting the execution sequence to reduce the peak value of memory. (#9094)

    • With self.config.enable_straighten_algorithm("MemoryFirst"), the straightened algorithm with memory optimization can be enabled.

    • The available modes are as follows: "MemoryFirst" / "SpeedFirst" / "Disable" / "OverlapCpuGpu"

    • At the same time, Graph adds the algorithm "OverlapCpuGpu" that make CPU and GPU kernel overlap with each other as much as possible. (#9278)

  • Graph provides generalized basic transmission, using nccl send/recv to realize fast communication for any NdSbp (2d, 3d,...), thus minimizing the transmission volume.(#8437 , #8783)

  • With autograd.Function, Graph is allowed to use custom op (#8843).

  • You can use the Graph Optimizer through param_group["lr_scale"], supporting configuring the learning rate for the parameter of each module/layer. (#9138)

  • Adds enable_multi_tensor_update optimization. Enabling by self.config.enable_multi_tensor_update(True), it will optimize the overhead of numerous broken parameters when updating the model. (#9209, #9252)

  • Adds enable_fused_model_update_cast optimization. Enabling by self.config.enable_fused_model_update_cast(True), it will speed up the training speed of the network by fusing Optimizer and fp16 cast when AMP is on. (#9209)

  • Graph supports non-uniform segmentation under ND-SBP. (#9310)

  • Graph supports LazyTensor's indexing feature.
    (#9334)

  • Adds enable_compress_memory int...

Read more

Version 0.8.0

18 Jul 06:05
Compare
Choose a tag to compare

OneFlow v0.8.0 Release Note

OneFlow v0.8.0 came out, welcome to install the new version for a better experience. 

  • Highlights
  • Backwards Incompatible Change
  • Deprecations
  • New Features
  • Performance
  • Improvements
  • Bug fixes
  • Documentation

Highlights

This update contains 523 commits and the following highlights:

  • PyTorch compatible APIs have been further optimized, 68 new APIs aligned with PyTorch have been added, and 84 compatibility bugs between operator and interface have been fixed. More PyTorch models support being one-button transferred into OneFlow.

  • All operators support Global Tensor more completely and efficiently, 28 Global Tensor-related bugs have been fixed, and 180 operator unit tests have been newly added.

  • Graph's advanced features have been further optimized:

    • In addition to the existing ZeRO-DP, Zero Redundancy Optimizer(ZeRO) can also be used in combination with MP parallelism, 2D parallelism, and 3D parallelism, which saves more memory overhead.

    • Graph provided new pipeline parallelism API, which not only simplifies the pipeline parallelism configuration but also optimizes the performance of pipeline parallelism and 3D parallelism.

    • Multi-dimensional debugging functionality in the logic graph, light plan physical graph, memory analysis, Python stack information, and others have been newly added, making Graph.debug more efficient.

  • Empowered by OneFlow v0.8.0 and LiBai v0.2.0, 3D parallelism speed under GPT and BERT witnesses a notable increase, and its training speed performance exceeds Megatron-LM with same configuration in multiple dimensions. For more details, please click here.

  • OneEmbedding has been released recently. It is an extension component designed for large-scale recommendation systems, boasting high efficiency, extensibility, flexibility, and other advantages.

  • Multi-Device adaptation: OneFlow v0.8.0 has provided a neat, efficient, and easily-extensible hardware abstraction layer called EP(Execution Provider) and defined a collection of basic computing interfaces called Primitive, allowing to re-implement kernels based on Primitive interface. 

  • Added new debugging tool stacks: OneFlow-Profiler and AutoProf

    • OneFlow-Profiler is a tool designed to collect performance information during framework execution. It can record the execution time of operators and system components, the allocation of memory and DRAM, and the corresponding input and parameters of operators. The information can help developers find out the main source of overhead in framework execution and thus implement targeted optimization.

    • AutoProf is a framework designed to efficiently detect the alignment between OneFlow APIs and PyTorch APIs. Besides, it can automatically compare the performance results of OneFlow APIs and PyTorch APIs.

  • Significantly optimized the exception handling process in OneFlow API and improved the error message when APIs meet exceptions.

  • Significantly optimized the OneFlow API documentation: the API documentation has been restructured based on functionality. In addition to general operator APIs, oneflow.nn.graph, oneflow.embedding, oneflow.autograd and other modules in OneFlow and their environment variables have also been explained in detail.

Backwards Incompatible Change

  • Graph has been re-designed to configure ZeRO API, which saves configuration and learning cost for users. Besides, the latest ZeRO supports 2D mixed parallelism that contains model parallelism and pipeline parallelism, and 3D parallelism.(#8036, #8404, #8464)

Outdated configuration method in OneFlow v0.7.0:

import oneflow as flow

class Graph(flow.nn.Graph):
    def __init__(self):
        super().__init__()
        self.linear = flow.nn.Linear(3, 8, False)
        self.config.set_zero_redundancy_optimizer_mode("distributed_split")
        if zero_stage > 1:
            # stage 2
            flow.boxing.nccl.enable_use_compute_stream(True)
            if zero_stage > 2:
                # stage 3
                flow.boxing.nccl.disable_group_boxing_by_dst_parallel(True)
    def build(self, x):
        return self.linear(x)

graph = Graph()

New interface in OneFlow v0.8.0:

import oneflow as flow

class Graph(flow.nn.Graph):
    def __init__(self):
        super().__init__()
        self.linear = flow.nn.Linear(3, 8, False)
        self.config.enable_zero(stage=2)
    def build(self, x):
        return self.linear(x)

graph = Graph()

Deprecations

Python API

  • The outdated parameter axis (remains compatible) in oneflow.sbp.split() has been uniformly changed into using dim to represent the slice dimension.(#8411)

v0.7.0

oneflow.sbp.split(axis=0)

v0.8.0

oneflow.sbp.split(dim=0)
  • For the outdated pipeline parallelism configuration method self.module_layer_0.config.stage_id = 0 (this method is not suggested ), we have added a novel pipeline parallelism API config.set_stage, which optimizes pipeline parallelism performance as well as avoids implementing the input_tensor.to_global(placement=this_stage_placement) operation for all module input tensors at every stage. (#8442)

v0.7.0

import oneflow as flow

B = [flow.sbp.broadcast]
P_0 = flow.placement(type = "cuda", ranks = [0, 1])
P_1 = flow.placement(type = "cuda", ranks = [2, 3])

class Graph(flow.nn.Graph):
    def __init__(self):
        super().__init__()
        self.m_stage0 = flow.nn.Linear(8, 8, False).to_global(placement=P_0, sbp=B)
        self.m_stage1 = flow.nn.Linear(8, 8, False).to_global(placement=P_1, sbp=B)
        # Set different module's stage id to hint the graph preparing right num of buffers in pipeline.
        self.m_stage0.config.stage_id = 0 
        self.m_stage1.config.stage_id = 1
        self.config.set_gradient_accumulation_steps(4)        

    def build(self, x):
        x = x.to_global(placement=P0, sbp=B)
        y = self.m_stage0(x)
        # Move tensor between different pipeline stages.
        y = y.to_global(placement=P1, sbp=B)
        z = self.m_stage1(y)
        return z

v0.8.0

class Graph(flow.nn.Graph):
    def __init__(self):
        super().__init__()
        self.m_stage0 = flow.nn.Linear(8, 8, False).to_global(placement=P_0, sbp=B)
        self.m_stage1 = flow.nn.Linear(8, 8, False).to_global(placement=P_1, sbp=B)
        # set_stage(stage_id, placement)
        # The Stage ID is numbered starting from 0 and increasing by 1.
        # The Placement is all tensors placement of this module.
        self.m_stage0.config.set_stage(stage_id=0, placement=P_0)
        self.m_stage1.config.set_stage(stage_id=1, placement=P_1)
        self.config.set_gradient_accumulation_steps(4)
    
    def build(self, x):
        # There will be automatically do tensor.to_global(placement) for all input tensor of this module.
        # So there is no need to write to_global() in/out of the module forward function.
        y = self.m_stage0(x)
        z = self.m_stage1(y)
        return z

New Features

Graph

  • Added new interfaces: oneflow.env.init_rdma and oneflow.env.rdma_is_initialized to delay turning on the RDMA, thus accelerating the network communications across multiple devices (Note: avoid using fork() after RDMA being turned on, for example, DataLoader’s num_workers > 1 should be executed before init rdma). #8415

  • Graph provided new algorithm optimization interface: graph.config.enable_straighten_algorithm to optimize the execution order in computation graph, which maximizes the overlap between transferring and computation. With this interface, the data transfer speed witnesses a 0.6% rise in data parallelism mode and a 6% rise in model parallelism mode. (#8347, #8483, #8495 )

  • Optimized the implementation of clip grad in Graph to support clip_grad_max_norm > 1.0 and provided configurable clip_grad_norm_type, which could only be set to 2 before but now can be set to +/- inf, +/- 1, +/- 2, +/- 3, and bigger p-norm values. See the reference from here (#7548)

  • Global tensor in Graph supported the tensor.set_item operation for invariable ops, for example, mask[:, :len_keep] = 0 (#7751)

  • Graph exported build_graph and compile_and_init_runtime interfaces, allowing to compile the pass that was previously self-defined by users after building the graph, thus rewriting and optimizing the graph. The two interfaces also supported Graph to restore an external graph (job). (#8168)

  • Added the RegisterJobPass interface to support rewriting the self-defined external job pass graph. (#8370)

  • oneflow.boxing.nccl.enable_use_compute_stream(True) optimized supports for NCCL logical kernel:

    • Added noncontiguous ReduceScatter kernel to support the conversion of P -> S(i), (i > 0) (#8361)

    • Supported the conversion of B -> S (#8355)

    • Enabled nccl send/recv primitives to support special SBP conversions (htt...

Read more

Version 0.7.0

18 Mar 06:14
Compare
Choose a tag to compare

OneFlow v0.7.0 Release Notes

OneFlow v0.7.0 came out. Welcome to use it. We would love to hear your feedback!

本文的中文版本

https://mp.weixin.qq.com/s/dSR-2Xw92eoFhF0c6MtutQ

Highlights

This release has the following highlights:

  1. Provides a Tensor that can be executed in multi-nodes multi-GPUs scenarios: Global Tensor. It is an easy-to-use solution for distributed execution. It makes it easier to implement various distributed parallel strategies and enables more flexible and user-friendly distributed implementation. It supports models including ResNet50, Wide and Deep, GPT, Bert, Swin-Transformer, InsightFace, etc.

  2. Continues to improve nn.Graph. Supports the advanced features such as ZeRO, GradAcc, Checkpointing, and Pipelining, and enriches the graph.debug mode. Supports random 2D SBP conversion, semi-automatic derivation of 2D SBP, resuming training from the last checkpoint, etc. Adds OneFlow Feature Stages Identifications and identifies each feature of nn.Graph. For nn.Graph, its basic features are at the Beta Stage, which can meet most of the requirements of users; Advanced features are at Alpha Stage, meeting standard requirements.

  3. Deeply optimizes the performance of Eager mode. The performance of the Swin-Transformer model is 3 times higher than that of v0.6.0 when tested on the V100.

  4. Operators-related improvements: In the single-node single-GPU scenario, OneFlow's compatibility with PyTorch is further improved. The interfaces, semantics, and produced results of operators supported by OneFlow are in consistent with that of operators supported by PyTorch and an automatic testing framework is designed to verify the consistency. With common models, you can accomplish the migration by running import oneflow as torch. Compared with v0.6.0, OneFlow adds 16 operators, optimizes the performance of 6 operators, and fixes bugs in 16 operators.

  5. Supports Einsum and View mechanism.

  6. Compiler-related improvements: OneFlow is officially connected to the MLIR ecosystem.

  7. Releases OneFlow-Serving v0.1.0: We provide an out-of-the-box Triton OneFlow backend docker image. try here.

  8. Releases LiBai v0.1.0, a toolbox for massively distributed parallel training of Transformer. Compared with customized code bases such as Megatron-LM, LiBai provides a series of models and training components for distributed training based on a modular design, aiming to make models trained in distributed mode as convenient as in single-GPU mode.

  9. Releases Flow-Vision v0.1.0: adds DeiT, ConvNeXt, ReXNet, and other models and updates tutorials and documentation.

OneFlow Feature Stages identifications

OneFlow Feature Stages identifies the maturity level of OneFlow features. It provides users with a status description of a feature to inform the specific level of it, such as completeness, API stability, documentation, etc. It Provides OneFlow developers with a standard for feature refinement, which facilitates further improvement.

OneFlow Feature Stages

  • Stable Stage

    • Purpose: release for production use
    • Audience: all users
    • Functionality: same as RC
    • Testing: same as RC
    • Performance: same as RC
    • API: same as RC, with stability within long cycles (e.g., 1 year) and large versions (e.g., 1.0)
    • Documentation: same as RC
  • Release Candidate (RC) Stage

    • Purpose: release for deployment evaluation in production environments
    • Audience: all users, including those who want to deploy production environments
    • Functionality: being able to handle exceptions as well as normal inputs.
    • Testing: end-to-end deployment validated in external environment with good experience
    • Performance: provide evaluation reports and documentation to evaluate performance and scalability in external environments
    • API: API for external user evaluation
    • Documentation: features in this stage are added to the core-feature-set documentation
  • Beta Stage

    • Purpose: release to provide a relatively stable, complete, and available version
    • Audience: all users, especially those with strong feature demands, little concern for unknown trivial issues, and willingness to provide feedback
    • Functionality: complete functionalities addressing the needs of various possible scenarios
    • Testing: complete, covering various corner test cases, and various end-to-end integration tests
    • Performance: performance evaluation and scalability evaluation
    • API: recognized as complete and stable by seed users after full review
    • Documentation: tutorials that describe the usage process
  • Alpah Stage

    • Purpose: release to get early feedback for experimental features
    • Audience: developers and expert users
    • Functionality: core functionality completed
    • Testing: unit testing completed for core requirements of the feature, possibly with unknown bugs
    • Performance: evaluated
    • API: well-defined but not rigorously reviewed, possibly requiring further changes
    • Documentation: API documentation is a must to provide feature definitions
  • Pre-alpha Stage

    • Purpose: release to validate feature prototypes or address urgent needs
    • Audience: feature developers
    • Functionality: limited prototype functionalities
    • Testing: limited testing, possibly with many bugs
    • Performance: unknown
    • API: prone to changes
    • Documentation: possibly none

OneFlow Framework

1. Distribution

Global Tensor

Global Tensor is a newly released set of distributed computing interfaces. It can easily support any parallelism including data parallelism, model parallelism, and pipeline parallelism. Unlike a normal Tensor (hereafter called Local Tensor), Global Tensor is a Tensor with a global view, whose data is distributed in a specific way across a set of devices in a cluster, and each node stores some or all of the Global Tensor's data. Placement and SBP are the basic properties of the Global Tensor that describe the distribution of the data in clusters.

Global Tensor's data distribution

Global Tensor supports three different ways of data distribution, which we collectively refer to as SBP.

  • Split (dim): The data is equally split along dim dimension and distributed to each device.
  • Broadcast: The data is replicated between each device.
  • PartialSum: The data is the element-wise addition for each device.

Consistent computational interfaces

Global Tensor has basically the same computational interfaces as Local Tensor. Only with small changes, you can convert the single-GPU mode to the distributed mode.

Local Tensor Global Tensor
>>> import oneflow as flow
>>> x = flow.tensor([1.0, 2.0])
>>> y = x * x
>>> import oneflow as flow
>>> x = flow.tensor([1.0, 2.0],
            placement=flow.placement("cuda", ranks=[0, 1]),
            sbp=flow.sbp.split(0))
>>> y = x * x
# This multiplication is performed on both rank 0 and rank 1

Supporting conversion between Local Tensor and Global Tensor

  • With Tensor.to_global interface, you can create a Global Tensor based on a Local Tensor, and regard this tensor as the local tensor of the Global Tensor on the present device.

  • With Tensor.to_local interface, you can return the local tensor of the Global Tensor on the present device.

Local Tensor To Global Tensor Global Tensor To Local Tensor
>>> import oneflow as flow
>>> x = flow.tensor([1.0, 2.0])
>>> y = x.to_global(
            placement=flow.placement("cuda", ranks=[0, 1]),
            sbp=flow.sbp.split(0))
>>> y.size()
oneflow.Size([4])
>>> y
tensor([1., 2., 1., 2.],
       placement=oneflow.placement(type="cuda", ranks=[0, 1]),
       sbp=(oneflow.sbp.split(axis=0),), dtype=oneflow.float32)
>>> import oneflow as flow
>>> x = flow.tensor([1.0, 2.0],
            placement=flow.placement("cuda", ranks=[0, 1]),
            sbp=flow.sbp.split(0))
>>> y = x.to_local()
>>> y.size()
oneflow.Size([1])
>>> y
tensor([1.], device='cuda:0', dtype=oneflow.float32)
# tensor([2.], device='cuda:0', dtype=oneflow.float32) if rank is 1

Supporting redistribution of Global Tensor in clusters

With Tensor.to_global interface, you can redistribute the data of Global Tensor in clusters. The data can be distributed to another set of nodes and the way of distribution in this set of nodes can also be changed (i.e.change SBP). Redistribution usually generates inter-process data communication, but Tensor.to_global interface finely avoids complicated low-level communication details.

>>> import oneflow as flow
>>> x = flow.tensor([1.0, 2.0], placement=flow.placement("cuda", ranks=[0, 1]), sbp=flow.sbp.split(0))
>>> y = x.to_global(placement=flow.placement("cuda", ranks=[2, 3]), sbp=flow.sbp.broadcast)

Each operator of OneFlow defines a set of SBP signatures for the input and output tensor. Global Tensor supports automatic redistribution to provide the required SBP signature of a certain interface. Just as the code shown below:

>>> import onefl...
Read more

Version 0.6.0

07 Jan 06:06
eabe79e
Compare
Choose a tag to compare

OneFlow v0.6.0 Release Notes

OneFlow has been open sourced for 528 days since July 31,2020. Today OneFlow v0.6.0 came out. Welcome to use OneFlow v0.6.0. We would love to hear feedback!

This version mainly updates three parts: framework, model, and OneFlow-ONNX. Hightlights include:

  • Performance optimization in static graphs, dynamic graphs, operators, memory occupation, etc
  • A larger number of common operators
  • Improvements in static graphs and ConsistentTensor
  • Serving functionality as Nvidia Triton's backend
  • Richer visual pre-training models similar to torchvision and timm
  • Better OneFlow-ONNX conversion functionality

The following are the detailed release notes.

Framework

1. Performance Optimization of nn.Graph

  • Compared to v0.5.0, nn.Graph in v0.6.0 delivers a 10% speedup in training on models such as ResNet AMP and WDL, etc
    • Optimized nn.Graph's performance in high frequency iterative training scenarios
    • Redesigned the scheduling instructions of nn.Graph and refactored the interaction logic between Actor Graph and Eager VM so that the runtime execution of the Graph is asynchronous and parallel to Python input/output Tensor as much as possible

2. Performance Optimization of Eager

  • Compared to v0.5.0, v0.6.0 OneFlow Eager's training speed increases dramatically in small batch scenarios
    • Optimized the scheduling logic for virtual machines
    • Optimized get/set item
    • Optimized tensor.numel()
    • Optimized oneflow.Size()

3. Performance Optimization of Operators

  • Optimized some operators that affect the performance of new model to significantly improve the training speed of these models

4. Performance Optimization of Eager's Memory Occupation

  • Optimized some operators' memory occupation during net training, making the same computing device run bigger models or data
    • Optimized the backward memory occupation of broadcast binary operators
    • Optimized the backward memory occupation of Slice operator
    • Optimized the memory occupation of LayerNorm operator

5. More Useful Features to Static Computation Graph (nn.Graph)

  • The newly added features are related to the effeciency, debugging, completeness, and usability of static graphs
    • To help the debugging of static graphs, we added the following features:
      • debug mode supports graph.debug(1) printing more information about the composition
      • Provided the environment variable ONEFLOW_DEBUG_PASS to show the changes in the computed graph before and after compile-time optimization
      • Added user-readable thread naming information to Nsight Profile for locating and retrieving target key thread locations
      • Added many static graph test cases and added automatic nn.Graph tests that accompany Eager tests
    • Provided graph.save() and load() interfaces to support the deployment of models (Serving) using nn.Graph
    • To do AMP acceleration on GPUs which use TensorCore, the environment variable ONEFLOW_ENABLE_NHWC is provided to indicate the CNN-related operators for channels last calculation
    • Enabled nn.Graph to support more usage scenarios:
      • Supported for Sparse Update Optimizer for sparse update of parameters in WDL scenarios
      • Supported for using the following nn.Module Containers with nn.Graph:
        Sequential, ModuleList, ModuleDict, ParameterList, and ParameterDict
      • Supported for creating Optimizer in the init function of nn.Graph
      • Supported multiple parameters sharing the same Tensor with nn.Graph
      • Supported for scenarios where the actual number of processes is greater than the number of GPU devices
      • Supported more Inplace execution for Consistent SBP inference under nn.Graph

6. A Larger Number of Operators

7. User-Defined autograd.Function

Users can customize autograd.Function just like using Torch.

8. Added Basic Serving Functionality

Serving functionality of models is provided by OneFlow as Nvidia Triton's backend.

9. Added Some Functionalities of Tensor (ConsistentTensor)

  • Supported Tensor using 2-D SBP to represent arbitrary hybrid parallelism (such as a Linear operation that runs data parallelism in the row direction of the device matrix and model parallelism in the column)
  • Supported Tensor's conversion from arbitrary 1-D SBP to 2-D SBP (the network consists of a mixture of 1-D parallel and 2-D parallel)
  • Supported constructing ConsistentTensor from numpy
  • oneflow.from_numpy()
  • oneflow.numel()
  • tensor.expand_as()

Model

Released flowvision 0.0.54.

1. Richer Visual Pre-training Models

Image Classification

  • CNN series: ResNet, DenseNet, VGG, ResNext, EfficientNet, etc
  • Vision Transformer series: ViT, PVT, Swin-Transformer, etc
  • Vision MLP series: Mlp-Mixer, Res-MLP, g-MLP, etc

Object Detection

  • SSD, SSDLite
  • Faster R-CNN
  • RetinaNet

Image Segmentation

  • FCN
  • DeepLabV3

Style Migration

  • StyleNet: Suport Styles sketch, candy, mosaic, rain_princess, and undie

2. Implemented Data Augmentation Operations Similar to torchvision

For data augmentation operations like CenterCrop and ColorJitter similar to torvhvision, developers can run import flowvision as torchvisionto execute in most scenarios.

3. Implemented Advanced Data Augmentation Opertations Similar to timm

Advanced data augmentation opertations implemented in flowvision.data:

  • Mixup
  • CutMix
  • Random-Erasing
  • AutoAugment
  • RandAugment
  • AugMix

4. Separated the Layers Module and Provided a Plug-and-play Block when Building a Model

flowvision.layers.attention

  • Implemented plug-and-play attention models like Non-Local, SELayer, CBAM, BAM, ECA, etc

flowvision.layers.blocks

  • Provided modules that might be used for model building like PatchEmb, Pooler, ConvBnAct, etc

flowvision.layers.regularization

  • Provided regularization modules such as drop-path, drop-block, and stochastic depth to improve model generalization ability
  • Provided separate files such as activation and weight_init to improve components like activation function and initialize method

OneFlow-ONNX Conversion

Updated OneFlow to ONNX toolkit:

  • Supported OneFlow model converting to ONNX model in CPU or GPU mode
  • Added test cases for operators and models to align all classification models in OneFlowVision library
  • Fixed onnx-runtime bugs during PReLU conversion
  • Compatible with v1.9.0 onnx-runtime library or later versions
  • Released v0.5.4 oneflow-onnx package, and developers can run pip install oneflow-onnx to experience

v0.5.0

08 Oct 07:06
57d0d18
Compare
Choose a tag to compare

Changelog

v0.5.0 (8/10/2021)

Highlights

  • First class support for eager execution. The deprecated APIs are moved to oneflow.compatible.single_client
  • Drop-in replacement of import torch for existing Pytorch projects. You could test it by inter-changing import oneflow as torch and import torch as flow.
  • nn.Module for eager execution
  • nn.Graph for lazy execution
  • DDP for data parallel

A sneak peek of the new API

Here is a minimum example showcasing how to incorporate a nn.Module in a nn.Graph and have it run in lazy mode.

class NeuralGraph(flow.nn.Graph):
    def __init__(self, ...):
        super().__init__()
        self.model = model # model is a nn.Module instance

    def build(self, x):
        y_pred = self.model(x)
        return y_pred

graph = NeuralGraph() # to create a nn.Graph instance
y_pred = graph(x) # to run the created nn.Graph

New in Python API

  • [feature][eager][op][test][python][interface] Add test for convtranspose2d #5239
  • [enhancement][python][interface] Add GroupNorm #5175
  • [enhancement][eager][python][interface] [Add] avgpool1d avgpool3d #5165
  • [feature][eager][op][python][interface] Add deconv cpu impl #5224
  • [bug][eager][api][python][interface] Fix acosh bug #5221
  • [feature][eager][op][python][interface] Dev modules ctc loss #5168
  • [bottleneck][bug][documentation][python][interface] Fix meshgrid test bug #5208
  • [eager][documentation][python][interface] Rename CosineScheduler to CosineAnnealingLR #5112
  • [feature][eager][python][interface] Add meshgrid module #5205
  • [enhancement][feature][bug][op][python] support bias in conv2d's parameter list #5322
  • [eager][documentation][api][python][interface] add not_equal, greater_equal and less_equal module #5350
  • [enhancement][eager][python] refine pow module and its test #5319
  • [enhancement][eager][op][python] Add triu op #5329
  • [enhancement][bug][python] Fix optimizer for not supporting all kinds of iterables #5355
  • [bug][python][interface] raise IndexError in get_canonical_index to support for loop #5345
  • [bug][python][interface] tensor slice assign supports broadcasting #5344
  • [enhancement][op][python] add cpu group conv logic #5314
  • [enhancement][python] Add 'nn.Mish' module and corresponding functions #5310
  • [enhancement][build][python] Remove ONNX from setup py #5297
  • [enhancement][python][interface] [add] zeropad2d #5278
  • [feature][system][python][interface] Lazy nn.Graph FeedInputOpExpr #5458
  • [feature][python][interface] integrate nn.image.flip #5411
  • [bug][python] Fix issues in point of MultiClientSession #5469
  • [enhancement][bug][python] update HasAllMultiClientEnvVars() #5459
  • [enhancement][python] Add in_top_k function #5428
  • [enhancement][python] Dev add docstring #5449
  • [feature][api][python] MultiClientSession #5407
  • [documentation][python] remove --user #5431
  • [feature][python][interface] nn.Graph python #5309
  • [feature][python][interface] Fea/nn graph/graph name #5413
  • [bug][python][interface] rm nn.Graph.train #5424
  • [op][documentation][api][python][interface] add bernoulli module #5353
  • [enhancement][python] flow.S/B/P #5306
  • [enhancement][documentation][python] Add instruction on upgrade pip #5400
  • [enhancement][python] Rm oneflow export and experimental #5589
  • [bug][python] Fix nn.graph.utils module conflict #5598
  • [feature][ci][python] Update autotest framework #5520
  • [enhancement][python] copy of_proto_python_dir to compatible_single_client_python #5539
  • [enhancement][api][python] del default env init #5537
  • [enhancement][python] Fix single client using same glog file #5535
  • [bug][api][python] Fix Session TryClose #5531
  • [enhancement][feature][python] split vector-matrix norm #5478
  • [feature][eager][op][python][interface] Add more upsample kernel #5382
  • [enhancement][feature][test][python] add torchstyle unittest #5489
  • [feature][system][python] nn.Graph with training #5662
  • [enhancement][feature][python] Fea/nn graph/block proxy func #5727
  • [enhancement][api][python] consistent_tensor_to_api #5703
  • [feature][eager][op][python] Dev Align torch avgpool #5610
  • [enhancement][python] fix circular deps of sbp python module #5706
  • [documentation][python] [part5]Remove singleclient outdated api #5674
  • [enhancement][python] [part4]Remove singleclient outdated api #5672
  • [bug][op][python] remove outdated code in conv3d #5696
  • [enhancement][test][python] enlarge tolerance of dataloader test #5689
  • [enhancement][test][python] add autotest for some math ops #5646
  • [feature][python] nn.Graph optimizer part 2: add L2, pass job complete, refactor #5604
  • [enhancement][python] Add clip_grad_norm #5299
  • [purge][python] Remove Single-Client API in oneflow default python #5827
  • [bug][python] Fix ddp grad size #5834
  • [enhancement][feature][python] Dev RMSprop graph conf #5768
  • [enhancement][purge][eager][python] remove scale arg in optimizer #5821
  • [enhancement][feature][python] graph/block io check #5803
  • [enhancement][feature][python] Dev adam graph conf #5709
  • [purge][python] [part10]Remove singleclient outdated api #5756
  • [feature][api][python] better repr of nn.Graph for debug #5762
  • [bug][python] fix weight decay in RMSprop #5755
  • [purge][python] [part9]Remove singleclient outdated api #5752
  • [purge][python] [part8]Remove singleclient outdated api #5750
  • [**document...
Read more

v0.5rc2

28 Sep 06:35
89bbc5b
Compare
Choose a tag to compare

Changelog

v0.5rc2 (28/09/2021)

Highlights

  • First class support for eager execution. The deprecated APIs are moved to oneflow.compatible.single_client
  • Drop-in replacement of import torch for existing Pytorch projects. You could test it by inter-changing import oneflow as torch and import torch as flow.
  • nn.Module for eager execution
  • nn.Graph for lazy execution
  • DDP for data parallel

A sneak peek of the new API

Here is a minimum example showcasing how to incorporate a nn.Module in a nn.Graph and have it run in lazy mode.

class NeuralGraph(flow.nn.Graph):
    def __init__(self, ...):
        super().__init__()
        self.model = model # model is a nn.Module instance

    def build(self, x):
        y_pred = self.model(x)
        return y_pred

graph = NeuralGraph() # to create a nn.Graph instance
y_pred = graph(x) # to run the created nn.Graph

New in Python API

  • [feature][eager][op][test][python][interface] Add test for convtranspose2d #5239
  • [enhancement][python][interface] Add GroupNorm #5175
  • [enhancement][eager][python][interface] [Add] avgpool1d avgpool3d #5165
  • [feature][eager][op][python][interface] Add deconv cpu impl #5224
  • [bug][eager][api][python][interface] Fix acosh bug #5221
  • [feature][eager][op][python][interface] Dev modules ctc loss #5168
  • [bottleneck][bug][documentation][python][interface] Fix meshgrid test bug #5208
  • [eager][documentation][python][interface] Rename CosineScheduler to CosineAnnealingLR #5112
  • [feature][eager][python][interface] Add meshgrid module #5205
  • [enhancement][feature][bug][op][python] support bias in conv2d's parameter list #5322
  • [eager][documentation][api][python][interface] add not_equal, greater_equal and less_equal module #5350
  • [enhancement][eager][python] refine pow module and its test #5319
  • [enhancement][eager][op][python] Add triu op #5329
  • [enhancement][bug][python] Fix optimizer for not supporting all kinds of iterables #5355
  • [bug][python][interface] raise IndexError in get_canonical_index to support for loop #5345
  • [bug][python][interface] tensor slice assign supports broadcasting #5344
  • [enhancement][op][python] add cpu group conv logic #5314
  • [enhancement][python] Add 'nn.Mish' module and corresponding functions #5310
  • [enhancement][build][python] Remove ONNX from setup py #5297
  • [enhancement][python][interface] [add] zeropad2d #5278
  • [feature][system][python][interface] Lazy nn.Graph FeedInputOpExpr #5458
  • [feature][python][interface] integrate nn.image.flip #5411
  • [bug][python] Fix issues in point of MultiClientSession #5469
  • [enhancement][bug][python] update HasAllMultiClientEnvVars() #5459
  • [enhancement][python] Add in_top_k function #5428
  • [enhancement][python] Dev add docstring #5449
  • [feature][api][python] MultiClientSession #5407
  • [documentation][python] remove --user #5431
  • [feature][python][interface] nn.Graph python #5309
  • [feature][python][interface] Fea/nn graph/graph name #5413
  • [bug][python][interface] rm nn.Graph.train #5424
  • [op][documentation][api][python][interface] add bernoulli module #5353
  • [enhancement][python] flow.S/B/P #5306
  • [enhancement][documentation][python] Add instruction on upgrade pip #5400
  • [enhancement][python] Rm oneflow export and experimental #5589
  • [bug][python] Fix nn.graph.utils module conflict #5598
  • [feature][ci][python] Update autotest framework #5520
  • [enhancement][python] copy of_proto_python_dir to compatible_single_client_python #5539
  • [enhancement][api][python] del default env init #5537
  • [enhancement][python] Fix single client using same glog file #5535
  • [bug][api][python] Fix Session TryClose #5531
  • [enhancement][feature][python] split vector-matrix norm #5478
  • [feature][eager][op][python][interface] Add more upsample kernel #5382
  • [enhancement][feature][test][python] add torchstyle unittest #5489
  • [feature][system][python] nn.Graph with training #5662
  • [enhancement][feature][python] Fea/nn graph/block proxy func #5727
  • [enhancement][api][python] consistent_tensor_to_api #5703
  • [feature][eager][op][python] Dev Align torch avgpool #5610
  • [enhancement][python] fix circular deps of sbp python module #5706
  • [documentation][python] [part5]Remove singleclient outdated api #5674
  • [enhancement][python] [part4]Remove singleclient outdated api #5672
  • [bug][op][python] remove outdated code in conv3d #5696
  • [enhancement][test][python] enlarge tolerance of dataloader test #5689
  • [enhancement][test][python] add autotest for some math ops #5646
  • [feature][python] nn.Graph optimizer part 2: add L2, pass job complete, refactor #5604
  • [enhancement][python] Add clip_grad_norm #5299
  • [purge][python] Remove Single-Client API in oneflow default python #5827
  • [bug][python] Fix ddp grad size #5834
  • [enhancement][feature][python] Dev RMSprop graph conf #5768
  • [enhancement][purge][eager][python] remove scale arg in optimizer #5821
  • [enhancement][feature][python] graph/block io check #5803
  • [enhancement][feature][python] Dev adam graph conf #5709
  • [purge][python] [part10]Remove singleclient outdated api #5756
  • [feature][api][python] better repr of nn.Graph for debug #5762
  • [bug][python] fix weight decay in RMSprop #5755
  • [purge][python] [part9]Remove singleclient outdated api #5752
  • [purge][python] [part8]Remove singleclient outdated api #5750
  • [**docume...
Read more

v0.5.0b1: Dev functional batch_gather (#6233)

13 Sep 08:58
1868c19
Compare
Choose a tag to compare

Changelog

v0.5.0b1 (13/09/2021)

Highlights

  • First class support for eager execution. The deprecated APIs are moved to oneflow.compatible.single_client
  • Drop-in replacement of import torch for existing Pytorch projects. You could test it by inter-changing import oneflow as torch and import torch as flow.
  • nn.Module for eager execution
  • nn.Graph for lazy execution
  • DDP for data parallel

A sneak peek of the new API

Here is a minimum example showcasing how to incorporate a nn.Module in a nn.Graph and have it run in lazy mode.

class NeuralGraph(flow.nn.Graph):
    def __init__(self, ...):
        super().__init__()
        self.model = model # model is a nn.Module instance

    def build(self, x):
        y_pred = self.model(x)
        return y_pred

graph = NeuralGraph() # to create a nn.Graph instance
y_pred = graph(x) # to run the created nn.Graph

New in Python API

  • [feature][eager][op][test][python][interface] Add test for convtranspose2d #5239
  • [enhancement][python][interface] Add GroupNorm #5175
  • [enhancement][eager][python][interface] [Add] avgpool1d avgpool3d #5165
  • [feature][eager][op][python][interface] Add deconv cpu impl #5224
  • [bug][eager][api][python][interface] Fix acosh bug #5221
  • [feature][eager][op][python][interface] Dev modules ctc loss #5168
  • [bottleneck][bug][documentation][python][interface] Fix meshgrid test bug #5208
  • [eager][documentation][python][interface] Rename CosineScheduler to CosineAnnealingLR #5112
  • [feature][eager][python][interface] Add meshgrid module #5205
  • [enhancement][feature][bug][op][python] support bias in conv2d's parameter list #5322
  • [eager][documentation][api][python][interface] add not_equal, greater_equal and less_equal module #5350
  • [enhancement][eager][python] refine pow module and its test #5319
  • [enhancement][eager][op][python] Add triu op #5329
  • [enhancement][bug][python] Fix optimizer for not supporting all kinds of iterables #5355
  • [bug][python][interface] raise IndexError in get_canonical_index to support for loop #5345
  • [bug][python][interface] tensor slice assign supports broadcasting #5344
  • [enhancement][op][python] add cpu group conv logic #5314
  • [enhancement][python] Add 'nn.Mish' module and corresponding functions #5310
  • [enhancement][build][python] Remove ONNX from setup py #5297
  • [enhancement][python][interface] [add] zeropad2d #5278
  • [feature][system][python][interface] Lazy nn.Graph FeedInputOpExpr #5458
  • [feature][python][interface] integrate nn.image.flip #5411
  • [bug][python] Fix issues in point of MultiClientSession #5469
  • [enhancement][bug][python] update HasAllMultiClientEnvVars() #5459
  • [enhancement][python] Add in_top_k function #5428
  • [enhancement][python] Dev add docstring #5449
  • [feature][api][python] MultiClientSession #5407
  • [documentation][python] remove --user #5431
  • [feature][python][interface] nn.Graph python #5309
  • [feature][python][interface] Fea/nn graph/graph name #5413
  • [bug][python][interface] rm nn.Graph.train #5424
  • [op][documentation][api][python][interface] add bernoulli module #5353
  • [enhancement][python] flow.S/B/P #5306
  • [enhancement][documentation][python] Add instruction on upgrade pip #5400
  • [enhancement][python] Rm oneflow export and experimental #5589
  • [bug][python] Fix nn.graph.utils module conflict #5598
  • [feature][ci][python] Update autotest framework #5520
  • [enhancement][python] copy of_proto_python_dir to compatible_single_client_python #5539
  • [enhancement][api][python] del default env init #5537
  • [enhancement][python] Fix single client using same glog file #5535
  • [bug][api][python] Fix Session TryClose #5531
  • [enhancement][feature][python] split vector-matrix norm #5478
  • [feature][eager][op][python][interface] Add more upsample kernel #5382
  • [enhancement][feature][test][python] add torchstyle unittest #5489
  • [feature][system][python] nn.Graph with training #5662
  • [enhancement][feature][python] Fea/nn graph/block proxy func #5727
  • [enhancement][api][python] consistent_tensor_to_api #5703
  • [feature][eager][op][python] Dev Align torch avgpool #5610
  • [enhancement][python] fix circular deps of sbp python module #5706
  • [documentation][python] [part5]Remove singleclient outdated api #5674
  • [enhancement][python] [part4]Remove singleclient outdated api #5672
  • [bug][op][python] remove outdated code in conv3d #5696
  • [enhancement][test][python] enlarge tolerance of dataloader test #5689
  • [enhancement][test][python] add autotest for some math ops #5646
  • [feature][python] nn.Graph optimizer part 2: add L2, pass job complete, refactor #5604
  • [enhancement][python] Add clip_grad_norm #5299
  • [purge][python] Remove Single-Client API in oneflow default python #5827
  • [bug][python] Fix ddp grad size #5834
  • [enhancement][feature][python] Dev RMSprop graph conf #5768
  • [enhancement][purge][eager][python] remove scale arg in optimizer #5821
  • [enhancement][feature][python] graph/block io check #5803
  • [enhancement][feature][python] Dev adam graph conf #5709
  • [purge][python] [part10]Remove singleclient outdated api #5756
  • [feature][api][python] better repr of nn.Graph for debug #5762
  • [bug][python] fix weight decay in RMSprop #5755
  • [purge][python] [part9]Remove singleclient outdated api #5752
  • [purge][python] [part8]Remove singleclient outdated api #5750
  • [**docum...
Read more

v0.3.0

13 Sep 10:00
1f55e3a
Compare
Choose a tag to compare
v0.3.0 Pre-release
Pre-release
fix default stride value (#6248)

* fix default stride value

* rename

* Remove numpy

* fix autotest to default

* modify x to input

v0.5rc1: graph support eager lrs (#6262)

14 Sep 05:41
76e78fd
Compare
Choose a tag to compare

Changelog

v0.5rc1 (13/09/2021)

Highlights

  • First class support for eager execution. The deprecated APIs are moved to oneflow.compatible.single_client
  • Drop-in replacement of import torch for existing Pytorch projects. You could test it by inter-changing import oneflow as torch and import torch as flow.
  • nn.Module for eager execution
  • nn.Graph for lazy execution
  • DDP for data parallel

A sneak peek of the new API

Here is a minimum example showcasing how to incorporate a nn.Module in a nn.Graph and have it run in lazy mode.

class NeuralGraph(flow.nn.Graph):
    def __init__(self, ...):
        super().__init__()
        self.model = model # model is a nn.Module instance

    def build(self, x):
        y_pred = self.model(x)
        return y_pred

graph = NeuralGraph() # to create a nn.Graph instance
y_pred = graph(x) # to run the created nn.Graph

New in Python API

  • [feature][eager][op][test][python][interface] Add test for convtranspose2d #5239
  • [enhancement][python][interface] Add GroupNorm #5175
  • [enhancement][eager][python][interface] [Add] avgpool1d avgpool3d #5165
  • [feature][eager][op][python][interface] Add deconv cpu impl #5224
  • [bug][eager][api][python][interface] Fix acosh bug #5221
  • [feature][eager][op][python][interface] Dev modules ctc loss #5168
  • [bottleneck][bug][documentation][python][interface] Fix meshgrid test bug #5208
  • [eager][documentation][python][interface] Rename CosineScheduler to CosineAnnealingLR #5112
  • [feature][eager][python][interface] Add meshgrid module #5205
  • [enhancement][feature][bug][op][python] support bias in conv2d's parameter list #5322
  • [eager][documentation][api][python][interface] add not_equal, greater_equal and less_equal module #5350
  • [enhancement][eager][python] refine pow module and its test #5319
  • [enhancement][eager][op][python] Add triu op #5329
  • [enhancement][bug][python] Fix optimizer for not supporting all kinds of iterables #5355
  • [bug][python][interface] raise IndexError in get_canonical_index to support for loop #5345
  • [bug][python][interface] tensor slice assign supports broadcasting #5344
  • [enhancement][op][python] add cpu group conv logic #5314
  • [enhancement][python] Add 'nn.Mish' module and corresponding functions #5310
  • [enhancement][build][python] Remove ONNX from setup py #5297
  • [enhancement][python][interface] [add] zeropad2d #5278
  • [feature][system][python][interface] Lazy nn.Graph FeedInputOpExpr #5458
  • [feature][python][interface] integrate nn.image.flip #5411
  • [bug][python] Fix issues in point of MultiClientSession #5469
  • [enhancement][bug][python] update HasAllMultiClientEnvVars() #5459
  • [enhancement][python] Add in_top_k function #5428
  • [enhancement][python] Dev add docstring #5449
  • [feature][api][python] MultiClientSession #5407
  • [documentation][python] remove --user #5431
  • [feature][python][interface] nn.Graph python #5309
  • [feature][python][interface] Fea/nn graph/graph name #5413
  • [bug][python][interface] rm nn.Graph.train #5424
  • [op][documentation][api][python][interface] add bernoulli module #5353
  • [enhancement][python] flow.S/B/P #5306
  • [enhancement][documentation][python] Add instruction on upgrade pip #5400
  • [enhancement][python] Rm oneflow export and experimental #5589
  • [bug][python] Fix nn.graph.utils module conflict #5598
  • [feature][ci][python] Update autotest framework #5520
  • [enhancement][python] copy of_proto_python_dir to compatible_single_client_python #5539
  • [enhancement][api][python] del default env init #5537
  • [enhancement][python] Fix single client using same glog file #5535
  • [bug][api][python] Fix Session TryClose #5531
  • [enhancement][feature][python] split vector-matrix norm #5478
  • [feature][eager][op][python][interface] Add more upsample kernel #5382
  • [enhancement][feature][test][python] add torchstyle unittest #5489
  • [feature][system][python] nn.Graph with training #5662
  • [enhancement][feature][python] Fea/nn graph/block proxy func #5727
  • [enhancement][api][python] consistent_tensor_to_api #5703
  • [feature][eager][op][python] Dev Align torch avgpool #5610
  • [enhancement][python] fix circular deps of sbp python module #5706
  • [documentation][python] [part5]Remove singleclient outdated api #5674
  • [enhancement][python] [part4]Remove singleclient outdated api #5672
  • [bug][op][python] remove outdated code in conv3d #5696
  • [enhancement][test][python] enlarge tolerance of dataloader test #5689
  • [enhancement][test][python] add autotest for some math ops #5646
  • [feature][python] nn.Graph optimizer part 2: add L2, pass job complete, refactor #5604
  • [enhancement][python] Add clip_grad_norm #5299
  • [purge][python] Remove Single-Client API in oneflow default python #5827
  • [bug][python] Fix ddp grad size #5834
  • [enhancement][feature][python] Dev RMSprop graph conf #5768
  • [enhancement][purge][eager][python] remove scale arg in optimizer #5821
  • [enhancement][feature][python] graph/block io check #5803
  • [enhancement][feature][python] Dev adam graph conf #5709
  • [purge][python] [part10]Remove singleclient outdated api #5756
  • [feature][api][python] better repr of nn.Graph for debug #5762
  • [bug][python] fix weight decay in RMSprop #5755
  • [purge][python] [part9]Remove singleclient outdated api #5752
  • [purge][python] [part8]Remove singleclient outdated api #5750
  • [**docume...
Read more