Releases: Qihoo360/XLearning
Releases · Qihoo360/XLearning
XLearning 1.4
Release XLearning 1.4
Major Features And Improvements
- Support the application running on the docker
- Support the mpi application
- ClusterDef is avaliable for TensorFlow Distribution Strategy API
- Allow the amount of memory to be set separately for chief and estimator worker for TensorFlow Application
- Specify the Yarn node label for job execution
- Multi-threads upload the output
- Allow the inter-result incremental upload
- Support the regular matching for input path
Bug Fixes and Other Changes
- The memory usage adjustment prompt is only displayed when the application finish status is successed.
XLearning 1.3
Release XLearning 1.3
Major Features And Improvements
- Support the lightLDA, see examples/lightLDA for use
- Support the xflow, see examples/xflow for use
- By submitting the configuration parameter to support the user-defined environment variable settings
- Setting the last worker as estimator role of the distribute TensorFlow application if the user set the
tf-evaluator
astrue
, see examples/tfEstimators for use - Define the single worker index to save the output by set the
output-index
- Port reservation mechanism optimization
- Local data container allocation priority mechanism
- Display resource application and usage information
- ps role function expansion: more convenient metrics use information rendering and output output upload
Bug Fixes and Other Changes
- Container waits for the remaining machine port addresses to be stuck in the process due to the failure of the Container in distributed mode
- After the worker applies, the number of redundant applications is released, and the remove request operation is added
- Application failed due to excessive environment variables too long of the input in PLACEHOLDER mode
- Job execution judgment failure condition control
- The status code returns incorrectly when the Container successfully exits
XLearning 1.2
Release XLearning 1.2
Major Features And Improvements
- Client print the containers status information when the state changes
- add the
xlearning.localresource.timeout
configuration to control the local resource download - support the VisualDL, see examples/mxnetVisualDL for use
- support the local cache when input strategy is
inputformat
with epoch greater than 1
Bug Fixes and Other Changes
- Add the exception handling for process of board and metrics
XLearning 1.1
Release XLearning 1.1
Major Features And Improvements
- worker or ps memory auto scaled when application retry after failed
- application exit as fail when container allocated exceed limit time
- support the user's job jar using the
--jars
when application submit - add the cpu metrics on the web display. Note that if hadoop version lower than 2.6.4, please see the FAQ first.
- support more distribute deep learning frameworks, such xgboost, LightGBM. Specific usage details please see the FAQ.
Bug Fixes and Other Changes
- fix nullPoint at the AppController
- more examples especially for the distribute mode application
- FAQ provides detailed instructions on how to use the new features