Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Concurrency Model #2634

Draft
wants to merge 62 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
7cc8205
Added initial stub deeplog objects
nvoxland Sep 12, 2023
0e3eb9b
Added initial stub deeplog objects
nvoxland Sep 12, 2023
53b1b13
create_tensor action + load new meta
FayazRahman Sep 13, 2023
53c546a
Fixed type hints
nvoxland Sep 13, 2023
c0e0f68
Working on getting load to load
nvoxland Sep 14, 2023
c48c35e
Load passes
nvoxland Sep 14, 2023
fc58849
Don't create tiles/encoder dirs
nvoxland Sep 14, 2023
7ea3881
Fixing reading data with fake deeplog
nvoxland Sep 14, 2023
c283aab
upds
FayazRahman Sep 15, 2023
f79fbc2
update deeplog from storages
FayazRahman Sep 15, 2023
4397f48
Brought over existing deeplog code
nvoxland Sep 15, 2023
df015b5
add tensor action
FayazRahman Sep 15, 2023
9a24821
initial cmake changes
levongh Sep 15, 2023
c9f8343
add cmake file for google tests
levongh Sep 15, 2023
733ee00
Able to bind and call deeplog sample class from python
nvoxland Sep 16, 2023
1ca0e8a
Moved test location and naming pattern
nvoxland Sep 16, 2023
dad16c6
Binded real deeplog class as DeepLogCpp for now
nvoxland Sep 16, 2023
1cb3615
add CMakeLists for deeplog subfolder
levongh Sep 16, 2023
4e58396
Merge remote-tracking branch 'origin/main' into deeplog_cpp
nvoxland Sep 16, 2023
9347898
Merge branch 'deeplog_cmake_configure' into deeplog_cpp
nvoxland Sep 18, 2023
8a823d4
Improving cmake config
nvoxland Sep 18, 2023
1ffb6ce
Renamed deeplake namespace to deeplog
nvoxland Sep 18, 2023
0be111c
Improving cmake config
nvoxland Sep 18, 2023
21ae80f
Working on binding deeplog to python
nvoxland Sep 19, 2023
fe93119
Created snapshot and transaction objects
nvoxland Sep 20, 2023
d8ea9f9
Switched action fields to be public vs. behind a reader method
nvoxland Sep 20, 2023
86d4b78
Merge branch 'deeplog_cpp' into deeplog
nvoxland Sep 21, 2023
0c7210e
Merge remote-tracking branch 'origin/main' into deeplog
nvoxland Sep 21, 2023
55b27c2
Merge remote-tracking branch 'origin/main' into deeplog
nvoxland Sep 22, 2023
9f3a77a
update cmake build
FayazRahman Sep 24, 2023
09334a2
move to _deeplake subpackage
FayazRahman Sep 24, 2023
484dd1a
create tensor almost
FayazRahman Sep 25, 2023
b6544be
use snapshot for metadata + bug fix
FayazRahman Sep 25, 2023
de0d351
Shifting to more arrow-native data handling
nvoxland Sep 25, 2023
63c3267
Shifting to more arrow-native data handling
nvoxland Sep 25, 2023
e44317e
Finishing up move of action-selection logic to snapshots not deeplog
nvoxland Sep 27, 2023
e945b79
Improving serialization logic
nvoxland Sep 28, 2023
51dd630
Added "log_format" argument to dataset creation
nvoxland Sep 28, 2023
7a7a5b2
Include tensors in the list of actions in the schema
nvoxland Sep 28, 2023
4d76973
Added log library
nvoxland Sep 28, 2023
19cbf87
Working on tensor creation from python
nvoxland Sep 28, 2023
16ee744
create tensor working
FayazRahman Sep 28, 2023
e93650b
Merge branch 'deeplog' of https://github.com/activeloopai/deeplake in…
FayazRahman Sep 28, 2023
160e1e7
smol
FayazRahman Sep 28, 2023
791af76
Merge branch 'main' into deeplog
nvoxland Oct 2, 2023
333289c
Merge remote-tracking branch 'origin/deeplog' into deeplog
nvoxland Oct 2, 2023
d26f312
Created storage module and wrapper for python version
nvoxland Oct 3, 2023
4bd1dd9
Fixed checkpoint logic to collapse actions
nvoxland Oct 3, 2023
58c8795
fixes + adding basic samples
FayazRahman Oct 4, 2023
9001252
Merge branch 'deeplog' of https://github.com/activeloopai/deeplake in…
FayazRahman Oct 4, 2023
67fb090
- Moved metadata commits to a seprate _meta directory
nvoxland Oct 5, 2023
7051d63
write and read working
FayazRahman Oct 5, 2023
ad2ce3e
merge confl
FayazRahman Oct 5, 2023
97c426c
fix test
FayazRahman Oct 5, 2023
9fa7823
Merge branch 'main' into deeplog
nvoxland Oct 5, 2023
1c94192
Correctly store create_tensor_action.link()
nvoxland Oct 10, 2023
61649a5
Merge remote-tracking branch 'origin/main' into deeplog
nvoxland Oct 10, 2023
51128b5
Ensure correct versions are committed
nvoxland Oct 10, 2023
434aa9a
Ensure correct versions are committed
nvoxland Oct 10, 2023
f36451f
Added base_test class
nvoxland Oct 10, 2023
0c9c7a8
Fixing tensor_link handling
nvoxland Oct 13, 2023
f5dd723
Fixing tensor_link handling
nvoxland Oct 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Expand Up @@ -23,7 +23,10 @@ dask-worker-space/
###################
*.com
*.class
*.exp
*.lib
*.dll
*.pdb
*.exe
*.o
*.so
Expand Down Expand Up @@ -218,3 +221,6 @@ benchmarks/torch_data
# API docs
api_docs/

/cpp/cmake-build-debug/
/cpp/external/
_deeplake/**/*.pyi
35 changes: 35 additions & 0 deletions cpp/CMakeLists.txt
@@ -0,0 +1,35 @@
cmake_minimum_required(VERSION 3.16)
project(deeplake)

set(CMAKE_POSITION_INDEPENDENT_CODE ON)

# Avoid warning about DOWNLOAD_EXTRACT_TIMESTAMP in CMake 3.24:
if (CMAKE_VERSION VERSION_GREATER_EQUAL "3.24.0")
cmake_policy(SET CMP0135 NEW)
endif()

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED True)
option(PYTHON_EXECUTABLE "Path to python executable")

if(APPLE)
set (CMAKE_OSX_DEPLOYMENT_TARGET 10.15)
else()
#skip multi architecture build for linux
set (CMAKE_OSX_ARCHITECTURES)
endif()

set(DEFAULT_PARENT_DIR ${CMAKE_CURRENT_SOURCE_DIR})
set(PYTHON_SOURCE ${CMAKE_CURRENT_SOURCE_DIR}/../)
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/FindStduuid.cmake)
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/FindJson.cmake)
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/FindPybind11.cmake)
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/FindGoogleTest.cmake)
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/FindSpdlog.cmake)
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/FindBackward.cmake)

add_subdirectory(storage)
add_subdirectory(deeplog)
add_subdirectory(tests)
add_subdirectory(py_api)

13 changes: 13 additions & 0 deletions cpp/cmake/FindBackward.cmake
@@ -0,0 +1,13 @@
include(FetchContent)

set(backward_URL https://github.com/bombela/backward-cpp/archive/refs/tags/v1.6.tar.gz)
set(backward_URL_HASH c654d0923d43f1cea23d086729673498e4741fb2457e806cfaeaea7b20c97c10)
set(backward_SOURCE_DIR ${DEFAULT_PARENT_DIR}/external/backward)

FetchContent_Declare(
backward
URL ${backward_URL}
URL_HASH SHA256=${backward_URL_HASH}
SOURCE_DIR ${backward_SOURCE_DIR}
)
FetchContent_MakeAvailable(backward)
15 changes: 15 additions & 0 deletions cpp/cmake/FindGoogleTest.cmake
@@ -0,0 +1,15 @@
include(FetchContent)

set(googletest_URL https://github.com/google/googletest.git)
set(googletest_TAG v1.12.0)
set(googletest_SOURCE_DIR ${DEFAULT_PARENT_DIR}/external/googletest)
set(googletest_INCLUDE_DIRS ${SOURCE_DIR}/googletest/include ${SOURCE_DIR}/googlemock/include)

FetchContent_Declare(googletest
GIT_REPOSITORY ${googletest_URL}
GIT_TAG ${googletest_TAG}
SOURCE_DIR ${googletest_SOURCE_DIR}
)

FetchContent_MakeAvailable(googletest)
include_directories(${googletest_INCLUDE_DIR})
13 changes: 13 additions & 0 deletions cpp/cmake/FindJson.cmake
@@ -0,0 +1,13 @@
include(FetchContent)

set(json_URL https://github.com/nlohmann/json/releases/download/v3.11.2/json.tar.xz)
set(json_URL_HASH 8c4b26bf4b422252e13f332bc5e388ec0ab5c3443d24399acb675e68278d341f)
set(json_SOURCE_DIR ${DEFAULT_PARENT_DIR}/external/json)

FetchContent_Declare(
json
URL ${json_URL}
URL_HASH SHA256=${json_URL_HASH}
SOURCE_DIR ${json_SOURCE_DIR}
)
FetchContent_MakeAvailable(json)
13 changes: 13 additions & 0 deletions cpp/cmake/FindPybind11.cmake
@@ -0,0 +1,13 @@
include(FetchContent)

set(pybind11_URL https://github.com/pybind/pybind11.git)
set(pybind11_TAG v2.11.1)
set(pybind11_SOURCE_DIR ${DEFAULT_PARENT_DIR}/external/pybind11)

FetchContent_Declare(pybind11
GIT_REPOSITORY ${pybind11_URL}
GIT_TAG ${pybind11_TAG}
SOURCE_DIR ${pybind11_SOURCE_DIR}
)

FetchContent_MakeAvailable(pybind11)
13 changes: 13 additions & 0 deletions cpp/cmake/FindSpdlog.cmake
@@ -0,0 +1,13 @@
include(FetchContent)

set(spdlog_URL https://github.com/gabime/spdlog/archive/refs/tags/v1.12.0.tar.gz)
set(spdlog_URL_HASH 4dccf2d10f410c1e2feaff89966bfc49a1abb29ef6f08246335b110e001e09a9)
set(spdlog_SOURCE_DIR ${DEFAULT_PARENT_DIR}/external/spdlog)

FetchContent_Declare(
spdlog
URL ${spdlog_URL}
URL_HASH SHA256=${spdlog_URL_HASH}
SOURCE_DIR ${spdlog_SOURCE_DIR}
)
FetchContent_MakeAvailable(spdlog)
13 changes: 13 additions & 0 deletions cpp/cmake/FindStduuid.cmake
@@ -0,0 +1,13 @@
include(FetchContent)

set(stduuid_URL https://github.com/mariusbancila/stduuid/archive/refs/tags/v1.2.3.zip)
set(stduuid_URL_HASH 0f867768ce55f2d8fa361be82f87f0ea5e51438bc47ca30cd92c9fd8b014e84e)
set(stduuid_SOURCE_DIR ${DEFAULT_PARENT_DIR}/external/stduuid)

FetchContent_Declare(
stduuid
URL ${stduuid_URL}
URL_HASH SHA256=${stduuid_URL_HASH}
SOURCE_DIR ${stduuid_SOURCE_DIR}
)
FetchContent_MakeAvailable(stduuid)
18 changes: 18 additions & 0 deletions cpp/deeplog/CMakeLists.txt
@@ -0,0 +1,18 @@
project(deeplog)

include(FetchContent)

find_package(Arrow REQUIRED)
find_package(Parquet REQUIRED)
find_package(ArrowDataset REQUIRED)

file(GLOB_RECURSE SOURCES "*.cpp")

add_library(deeplog ${SOURCES})

target_link_libraries(deeplog PUBLIC storage)
target_link_libraries(deeplog PUBLIC stduuid nlohmann_json::nlohmann_json)
target_link_libraries(deeplog PUBLIC "$<IF:$<BOOL:${ARROW_BUILD_STATIC}>,Arrow::arrow_static,Arrow::arrow_shared>")
target_link_libraries(deeplog PUBLIC "$<IF:$<BOOL:${ARROW_BUILD_STATIC}>,Parquet::parquet_static,Parquet::parquet_shared>")
target_link_libraries(deeplog PUBLIC "$<IF:$<BOOL:${ARROW_BUILD_STATIC}>,ArrowDataset::arrow_dataset_static,ArrowDataset::arrow_dataset_shared>")
target_link_libraries(deeplog PUBLIC spdlog::spdlog)
6 changes: 6 additions & 0 deletions cpp/deeplog/actions/action.cpp
@@ -0,0 +1,6 @@
#include "action.hpp"

namespace deeplog {


};
17 changes: 17 additions & 0 deletions cpp/deeplog/actions/action.hpp
@@ -0,0 +1,17 @@
#pragma once

#include <arrow/api.h>
#include "deeplog_serializable.hpp"

namespace deeplog {

class action : public deeplog_serializable {

public:
virtual nlohmann::json to_json() = 0;

virtual std::string action_name() = 0;

virtual std::shared_ptr<arrow::StructType> action_type() = 0;
};
}
46 changes: 46 additions & 0 deletions cpp/deeplog/actions/add_file_action.cpp
@@ -0,0 +1,46 @@
#include "add_file_action.hpp"

namespace deeplog {

std::shared_ptr<arrow::StructType> add_file_action::arrow_type = std::dynamic_pointer_cast<arrow::StructType>(
arrow::struct_({
arrow::field("path", arrow::utf8(), true),
arrow::field("type", arrow::utf8(), true),
arrow::field("size", arrow::uint64(), true),
arrow::field("modificationTime", arrow::uint64(), true),
arrow::field("dataChange", arrow::boolean(), true),
arrow::field("numSamples", arrow::uint64(), true),
}));

add_file_action::add_file_action(std::string path, std::string type, const long &size, const long &modification_time, const bool &data_change, const long &num_samples) :
path(std::move(path)), type(std::move(type)), size(size), modification_time(modification_time), data_change(data_change), num_samples(num_samples) {}

add_file_action::add_file_action(const std::shared_ptr<arrow::StructScalar> &value) {
path = from_struct<std::string>("path", value).value();
type = from_struct<std::string>("type", value).value();
size = from_struct<unsigned long>("size", value).value();
modification_time = from_struct<long>("modificationTime", value).value();
data_change = from_struct<bool>("dataChange", value).value();
num_samples = from_struct<unsigned long>("numSamples", value).value();
}

std::string add_file_action::action_name() {
return "add";
}

std::shared_ptr<arrow::StructType> add_file_action::action_type() {
return arrow_type;
}

nlohmann::json add_file_action::to_json() {
nlohmann::json json;
json["path"] = path;
json["type"] = type;
json["size"] = size;
json["modificationTime"] = modification_time;
json["dataChange"] = data_change;
json["numSamples"] = num_samples;

return json;
}
}
29 changes: 29 additions & 0 deletions cpp/deeplog/actions/add_file_action.hpp
@@ -0,0 +1,29 @@
#pragma once

#include "action.hpp"

namespace deeplog {
class add_file_action : public action {

public:
std::string path;
std::string type;
unsigned long size;
long modification_time;
bool data_change;
unsigned long num_samples;

public:
static std::shared_ptr<arrow::StructType> arrow_type;

add_file_action(std::string path, std::string type, const long &size, const long &modification_time, const bool &data_change, const long &num_samples);

explicit add_file_action(const std::shared_ptr<arrow::StructScalar> &struct_scalar);

nlohmann::json to_json() override;

std::string action_name() override;

std::shared_ptr<arrow::StructType> action_type() override;
};
}
42 changes: 42 additions & 0 deletions cpp/deeplog/actions/create_branch_action.cpp
@@ -0,0 +1,42 @@
#include "create_branch_action.hpp"

#include <utility>

namespace deeplog {
std::shared_ptr<arrow::StructType> create_branch_action::arrow_type = std::dynamic_pointer_cast<arrow::StructType>(
arrow::struct_({
arrow::field("id", arrow::utf8(), true),
arrow::field("name", arrow::utf8(), true),
arrow::field("fromId", arrow::utf8(), true),
arrow::field("fromVersion", arrow::uint64(), true),
}));


create_branch_action::create_branch_action(std::string id, std::string name, std::optional<std::string> from_id, const std::optional<unsigned long> &from_version) :
id(std::move(id)), name(std::move(name)), from_id(std::move(from_id)), from_version(from_version) {}

create_branch_action::create_branch_action(const std::shared_ptr<arrow::StructScalar> &value) {
id = from_struct<std::string>("id", value).value();
name = from_struct<std::string>("name", value).value();
from_id = from_struct<std::string>("fromId", value);
from_version = from_struct<long>("fromVersion", value);
}

std::string create_branch_action::action_name() {
return "branch";
}

std::shared_ptr<arrow::StructType> create_branch_action::action_type() {
return arrow_type;
}

nlohmann::json create_branch_action::to_json() {
nlohmann::json json;
json["id"] = id;
json["name"] = name;
json["fromId"] = to_json_value(from_id);
json["fromVersion"] = to_json_value(from_version);

return json;
}
}
27 changes: 27 additions & 0 deletions cpp/deeplog/actions/create_branch_action.hpp
@@ -0,0 +1,27 @@
#pragma once

#include "action.hpp"

namespace deeplog {
class create_branch_action : public action {

public:
std::string id;
std::string name;
std::optional<std::string> from_id;
std::optional<unsigned long> from_version;

public:
static std::shared_ptr<arrow::StructType> arrow_type;

create_branch_action(std::string id, std::string name, std::optional<std::string> from_id, const std::optional<unsigned long> &from_version);

explicit create_branch_action(const std::shared_ptr<arrow::StructScalar> &struct_scalar);

nlohmann::json to_json() override;

std::string action_name() override;

std::shared_ptr<arrow::StructType> action_type() override;
};
}
44 changes: 44 additions & 0 deletions cpp/deeplog/actions/create_commit_action.cpp
@@ -0,0 +1,44 @@
#include "create_commit_action.hpp"

namespace deeplog {

std::shared_ptr<arrow::StructType> create_commit_action::arrow_type = std::dynamic_pointer_cast<arrow::StructType>(
arrow::struct_({
arrow::field("id", arrow::utf8(), true),
arrow::field("branchId", arrow::utf8(), true),
arrow::field("branchVersion", arrow::uint64(), true),
arrow::field("message", arrow::utf8(), true),
arrow::field("commitTime", arrow::uint64(), true),
}));

create_commit_action::create_commit_action(std::string id, std::string branch_id, const unsigned long &branch_version, const std::optional<std::string> &message, const long &commit_time) :
id(std::move(id)), branch_id(std::move(branch_id)), branch_version(branch_version), message(std::move(message)), commit_time(commit_time) {}

create_commit_action::create_commit_action(const std::shared_ptr<arrow::StructScalar> &value) {
id = from_struct<std::string>("id", value).value();
branch_id = from_struct<std::string>("branchId", value).value();
branch_version = from_struct<long>("branchVersion", value).value();
message = from_struct<std::string>("message", value);
commit_time = from_struct<long>("commitTime", value).value();
}

std::string create_commit_action::action_name() {
return "commit";
}

std::shared_ptr<arrow::StructType> create_commit_action::action_type() {
return arrow_type;
}

nlohmann::json create_commit_action::to_json() {
nlohmann::json json;

json["id"] = id;
json["branchId"] = branch_id;
json["branchVersion"] = branch_version;
json["message"] = to_json_value<std::string>(message);
json["commitTime"] = commit_time;

return json;
}
}