Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TopicSegmentationRegularizer implementation in biagartm9.0 #917

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

Seriont
Copy link

@Seriont Seriont commented Jul 19, 2018

No description provided.


if background_topic_names is not None:
if isinstance(background_topic_names, string_types):
background_topic_names = [background_topic_names]
for topic_name in background_topic_names:
self._config.background_topic_names.append(topic_name)
elif config is not None and len(config.background_topic_names):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you delete loading of parameters from config? Also you should add class fields for new parameters, add setters/getters and make parameter loading from config for them too.

@@ -1,9 +1,10 @@
// Copyright 2017, Additive Regularization of Topic Models.

// Author: Anastasia Bayandina (anast.bayandina@gmail.com)
// Author: Anastasia Bayandina
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add yourself here with email

@@ -16,107 +17,272 @@ namespace regularizer {

void TopicSegmentationPtdwAgent::Apply(int item_index, int inner_iter,
::artm::utility::LocalPhiMatrix<float>* ptdw) const {
const int local_token_size = ptdw->no_rows();
const int topic_size = ptdw->no_columns();
int local_token_size = ptdw->no_rows();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove const qualifiers?

if (is_background_topic[k]) {
background_probability[i] += local_ptdw_ptr[k];
}
}
}
}
//for (int i = 0; i < local_token_size; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that this code is dirty, but try not to pull request temp comments, you can store them in a local branch (thx git:))

float tau = tau_;
float alpha = config_.merge_threshold();
bool merge_into_segments = config_.merge_into_segments();
std::list<int> &dot_positions = dot_positions_[item_index];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type& variable;

std::shared_ptr<RegularizePtdwAgent>
TopicSegmentationPtdw::CreateRegularizePtdwAgent(const Batch& batch,
const ProcessBatchesArgs& args, float tau) {
TopicSegmentationPtdwAgent* agent = new TopicSegmentationPtdwAgent(config_, args, tau);
std::vector< std::list<int> > dot_positions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need in space into template angles

int token_id = item.transaction_token_id(token_index);
if (batch.token(token_id) == ".") {
current_dots.push_back(token_index);
dot_count += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@@ -20,18 +20,21 @@ class TopicSegmentationPtdwAgent : public RegularizePtdwAgent {
friend class TopicSegmentationPtdw;
TopicSegmentationPtdwConfig config_;
ProcessBatchesArgs args_;
float tau_;
mutable std::vector< std::list<int> > dot_positions_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove spaces

: config_(config)
, args_(args) { }
TopicSegmentationPtdwAgent(const TopicSegmentationPtdwConfig& config, const ProcessBatchesArgs& args,
double tau, std::vector< std::list<int> > dot_positions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return parameters as column
: par1
, par2

and add space between {}


virtual void Apply(int item_index, int inner_iter, ::artm::utility::LocalPhiMatrix<float>* ptdw) const;
};

class TopicSegmentationPtdw : public RegularizerInterface {
public:
explicit TopicSegmentationPtdw(const TopicSegmentationPtdwConfig& config) : config_(config) { }
explicit TopicSegmentationPtdw(const TopicSegmentationPtdwConfig& config)
: config_(config) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ }

Copy link
Contributor

@MelLain MelLain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good! There're some minor notes, the most important is to replace it++ to ++it for every iterator in code, as first does unnecessary copy


// Author: Anastasia Bayandina (anast.bayandina@gmail.com)
// Author: Nikolay Skachkov
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

email, please)


#include <vector>
#include <algorithm>
#include <cmath>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c headers should go first in list of imports

@@ -14,109 +15,186 @@
namespace artm {
namespace regularizer {


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need in this empty line

std::vector<float> background_probability(local_token_size, 0.0f);

// if background topics are given, count probability for each word to be background
//// if background topics are given, count probability for each word to be background
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//// -> //

std::list<int>& dot_positions = dot_positions_[item_index];
float average_len = 0.0f;
int seg_begin = 0;
for (auto it = dot_positions.begin(); it != dot_positions.end(); it++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it++ -> ++it

sen_begin = sen_end + 1;
prev_sen_subject = sen_subj;
prev_seg_len = sen_len;
it++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it++ -> ++it

(*ptdw)(i, k) = 0.0f;
sigma /= static_cast<int>(distances.size());
int i = 0;
std::list<int>::iterator it = dot_positions.begin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

if (distances[i] > mean_dist + alpha * sigma) {
it = dot_positions.erase(it);
count++;
it--;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--it

: config_(config),
args_(args),
tau_(tau),
dot_positions_(dot_positions) { }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: a1
, a2
, a3
...


virtual void Apply(int item_index, int inner_iter, ::artm::utility::LocalPhiMatrix<float>* ptdw) const;
};

class TopicSegmentationPtdw : public RegularizerInterface {
public:
explicit TopicSegmentationPtdw(const TopicSegmentationPtdwConfig& config) : config_(config) { }
explicit TopicSegmentationPtdw(const TopicSegmentationPtdwConfig& config)
: config_(config) { }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return into one line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants