Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Convert Function as discussed on Issue#22 #33

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

heisenbuug
Copy link

@heisenbuug heisenbuug commented Aug 24, 2020

Issue link

Added a new folder dataset_utils and currently the convert function can convert

  • csv -> json
  • csv -> xml

@mlpack-bot
Copy link

mlpack-bot bot commented Aug 24, 2020

Thanks for opening your first pull request in this repository! Someone will review it when they have a chance. In the mean time, please be sure that you've handled the following things, to make the review process quicker and easier:

  • All code should follow the style guide
  • Documentation added for any new functionality
  • Tests added for any new functionality
  • Tests that are added follow the testing guide
  • Headers and license information added to the top of any new code files
  • HISTORY.md updated if the changes are big or user-facing
  • All CI checks should be passing

Thank you again for your contributions! 👍

Copy link
Member

@kartikdutt18 kartikdutt18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @heisenbuug, Thanks a lot for working on this. I've left some style comments other than that it looks good to me.
Could you also add tests for this, we can download files from the net during runtime using utils in models repo (Refer other tests). Again, Thanks for working on this. 👍

dataset_utils/convert.cpp Outdated Show resolved Hide resolved
dataset_utils/convert.cpp Outdated Show resolved Hide resolved
dataset_utils/convert.cpp Outdated Show resolved Hide resolved
Co-authored-by: kartikdutt18 <39593019+kartikdutt18@users.noreply.github.com>
@mlpack-bot
Copy link

mlpack-bot bot commented Oct 30, 2020

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot
Copy link

mlpack-bot bot commented Mar 21, 2021

Thanks for opening your first pull request in this repository! Someone will review it when they have a chance. In the mean time, please be sure that you've handled the following things, to make the review process quicker and easier:

  • All code should follow the style guide
  • Documentation added for any new functionality
  • Tests added for any new functionality
  • Tests that are added follow the testing guide
  • Headers and license information added to the top of any new code files
  • HISTORY.md updated if the changes are big or user-facing
  • All CI checks should be passing

Thank you again for your contributions! 👍

Copy link
Member

@kartikdutt18 kartikdutt18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @heisenbuug, Sorry I forgot about this PR. This looks promising, Let me know if you need any assistance. Thanks for this PR.

ctr++;
}

auto create_JSON(std::vector<std::string>& tags, std::vector<std::string> rows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, can we use camel case to be consistent.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heisenbuug, Can I do it for you?

Comment on lines +19 to +28
auto tokenize(std::string& line)
{
std::vector<std::string> col_names;
tokenizer<escaped_list_separator<char> > tk(line, escaped_list_separator<char>());
for (tokenizer<escaped_list_separator<char> >::iterator i(tk.begin()); i != tk.end(); ++i)
col_names.push_back(*i);
return col_names;
}

auto create_XML(std::vector<std::string>& tags, std::vector<std::string> rows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some issue with indentation. Also, can we replace auto with data type (makes it easier to figure out all properties).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heisenbuug, Can I also do this?

using namespace boost::property_tree;
using namespace boost;

class Convert
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, can we also add class description and usage here.

}

public:
void convert(std::string path, std::string to)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this function static. No need to declare an object for converting datasets. Also Capitalize the first letter of the function.

@heisenbuug
Copy link
Author

Hey @kartikdutt18, I will add json. xml -> csv by the end of today or tomorrow...
I will also keep in mind the edits you suggested in here...

@heisenbuug
Copy link
Author

heisenbuug commented Mar 24, 2021

Hey @kartikdutt18, I noticed something...
If you go to the example here to see how the convert function works you can see that we gave an input .csv file with 10 rows and we got 1 XML file per row as output making total to 10 .xml files as output, is this right? Or do we also want an option to put all these in the same .xml file?

Shouldn't we have to put all of them in 1 file?

Also, we are planning to reduce boost dependencies, right? So should we use boost here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants