New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TopicSegmentationRegularizer implementation in biagartm9.0 #917
base: master
Are you sure you want to change the base?
Conversation
python/artm/regularizers.py
Outdated
|
||
if background_topic_names is not None: | ||
if isinstance(background_topic_names, string_types): | ||
background_topic_names = [background_topic_names] | ||
for topic_name in background_topic_names: | ||
self._config.background_topic_names.append(topic_name) | ||
elif config is not None and len(config.background_topic_names): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you delete loading of parameters from config? Also you should add class fields for new parameters, add setters/getters and make parameter loading from config for them too.
@@ -1,9 +1,10 @@ | |||
// Copyright 2017, Additive Regularization of Topic Models. | |||
|
|||
// Author: Anastasia Bayandina (anast.bayandina@gmail.com) | |||
// Author: Anastasia Bayandina |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add yourself here with email
@@ -16,107 +17,272 @@ namespace regularizer { | |||
|
|||
void TopicSegmentationPtdwAgent::Apply(int item_index, int inner_iter, | |||
::artm::utility::LocalPhiMatrix<float>* ptdw) const { | |||
const int local_token_size = ptdw->no_rows(); | |||
const int topic_size = ptdw->no_columns(); | |||
int local_token_size = ptdw->no_rows(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove const qualifiers?
if (is_background_topic[k]) { | ||
background_probability[i] += local_ptdw_ptr[k]; | ||
} | ||
} | ||
} | ||
} | ||
//for (int i = 0; i < local_token_size; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that this code is dirty, but try not to pull request temp comments, you can store them in a local branch (thx git:))
float tau = tau_; | ||
float alpha = config_.merge_threshold(); | ||
bool merge_into_segments = config_.merge_into_segments(); | ||
std::list<int> &dot_positions = dot_positions_[item_index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type& variable;
std::shared_ptr<RegularizePtdwAgent> | ||
TopicSegmentationPtdw::CreateRegularizePtdwAgent(const Batch& batch, | ||
const ProcessBatchesArgs& args, float tau) { | ||
TopicSegmentationPtdwAgent* agent = new TopicSegmentationPtdwAgent(config_, args, tau); | ||
std::vector< std::list<int> > dot_positions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need in space into template angles
int token_id = item.transaction_token_id(token_index); | ||
if (batch.token(token_id) == ".") { | ||
current_dots.push_back(token_index); | ||
dot_count += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@@ -20,18 +20,21 @@ class TopicSegmentationPtdwAgent : public RegularizePtdwAgent { | |||
friend class TopicSegmentationPtdw; | |||
TopicSegmentationPtdwConfig config_; | |||
ProcessBatchesArgs args_; | |||
float tau_; | |||
mutable std::vector< std::list<int> > dot_positions_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove spaces
: config_(config) | ||
, args_(args) { } | ||
TopicSegmentationPtdwAgent(const TopicSegmentationPtdwConfig& config, const ProcessBatchesArgs& args, | ||
double tau, std::vector< std::list<int> > dot_positions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return parameters as column
: par1
, par2
and add space between {}
|
||
virtual void Apply(int item_index, int inner_iter, ::artm::utility::LocalPhiMatrix<float>* ptdw) const; | ||
}; | ||
|
||
class TopicSegmentationPtdw : public RegularizerInterface { | ||
public: | ||
explicit TopicSegmentationPtdw(const TopicSegmentationPtdwConfig& config) : config_(config) { } | ||
explicit TopicSegmentationPtdw(const TopicSegmentationPtdwConfig& config) | ||
: config_(config) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good! There're some minor notes, the most important is to replace it++ to ++it for every iterator in code, as first does unnecessary copy
|
||
// Author: Anastasia Bayandina (anast.bayandina@gmail.com) | ||
// Author: Nikolay Skachkov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
email, please)
|
||
#include <vector> | ||
#include <algorithm> | ||
#include <cmath> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c headers should go first in list of imports
@@ -14,109 +15,186 @@ | |||
namespace artm { | |||
namespace regularizer { | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need in this empty line
std::vector<float> background_probability(local_token_size, 0.0f); | ||
|
||
// if background topics are given, count probability for each word to be background | ||
//// if background topics are given, count probability for each word to be background |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//// -> //
std::list<int>& dot_positions = dot_positions_[item_index]; | ||
float average_len = 0.0f; | ||
int seg_begin = 0; | ||
for (auto it = dot_positions.begin(); it != dot_positions.end(); it++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it++ -> ++it
sen_begin = sen_end + 1; | ||
prev_sen_subject = sen_subj; | ||
prev_seg_len = sen_len; | ||
it++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it++ -> ++it
(*ptdw)(i, k) = 0.0f; | ||
sigma /= static_cast<int>(distances.size()); | ||
int i = 0; | ||
std::list<int>::iterator it = dot_positions.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto
if (distances[i] > mean_dist + alpha * sigma) { | ||
it = dot_positions.erase(it); | ||
count++; | ||
it--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--it
: config_(config), | ||
args_(args), | ||
tau_(tau), | ||
dot_positions_(dot_positions) { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: a1
, a2
, a3
...
|
||
virtual void Apply(int item_index, int inner_iter, ::artm::utility::LocalPhiMatrix<float>* ptdw) const; | ||
}; | ||
|
||
class TopicSegmentationPtdw : public RegularizerInterface { | ||
public: | ||
explicit TopicSegmentationPtdw(const TopicSegmentationPtdwConfig& config) : config_(config) { } | ||
explicit TopicSegmentationPtdw(const TopicSegmentationPtdwConfig& config) | ||
: config_(config) { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return into one line
No description provided.