Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New structure for internal methods to reduce code repetition #41

Open
csemken opened this issue Jun 3, 2020 · 2 comments
Open

New structure for internal methods to reduce code repetition #41

csemken opened this issue Jun 3, 2020 · 2 comments

Comments

@csemken
Copy link
Contributor

csemken commented Jun 3, 2020

Currently, the structure of Stargazer’s internal methods for generating the table in different formats (generate_header_html, generate_r2_latex, etc.) makes it easy to navigate the code, but also leads to a lot of repetitive code. This makes changes and additions difficult. I give two examples of this below and suggest a solution.

Problem 1: repetition within a format

def generate_observations_html(self):
    obs_text = ''
    if not self.show_n:
        return obs_text
    obs_text += '<tr><td style="text-align: left">Observations</td>'
    for md in self.model_data:
        obs_text += '<td>' + str(md['degree_freedom'] + md['degree_freedom_resid'] + 1) + '</td>'
    obs_text += '</tr>'
    return obs_text

def generate_r2_html(self):
    r2_text = ''
    if not self.show_r2:
        return r2_text
    r2_text += '<tr><td style="text-align: left">R<sup>2</sup></td>'
    for md in self.model_data:
        r2_text += '<td>' + str(round(md['r2'], self.sig_digits)) + '</td>'
    r2_text += '</tr>'
    return r2_text

These are just two functions that are used to render the HTML table. Note that the only difference are: the show_* boolean to check, the title of the column and the content. The formatting and the code to produce the table row is exactly the same.

As a result, adding a new statistic requires copying all of the above code. Similarly, adding a tag to the tr or td elements would already require over 10 changes in the code.

Problem 2: Repetition between formats

def generate_footer_html(self):
    footer = '<td colspan="' + str(self.num_models + 1) + '" style="border-bottom: 1px solid black"></td></tr>'
    ...
    footer += self.generate_observations_html()
    footer += self.generate_r2_html()
    ...
    return footer

def generate_footer_latex(self):
    footer = '\\hline \\\\[-1.8ex]\n'
    ...
    footer += self.generate_observations_latex()
    footer += self.generate_r2_latex()
    ...
    return footer

There is also a lot of repetitive code between formats. Specifically, all generate_* methods are implemented twice (once for HTML and once for Latex).

Again, this means that any change to the table needs to be repeated twice, even when the change has nothing to do with the formatting (e.g. #11). This problem will only get worse as more formats are added.

Solution: a new structure

Here is a sketch of the new structure I propose:

def render_html(self):
    html = '<table>'
    ...
    html += _gen_footer(format='html')
    html += '</table>'
    return html

def _gen_footer(self, format):
    footer = self._gen_sep(format)
    ...
    if self.show_n:
        n_row = ['Observations'] + [md['degree_freedom'] + md['degree_freedom_resid'] + 1 for md in self.model_data]
        footer += self._gen_row(n_row, format)
    ...
    return footer

def _gen_sep(self, format):
    if format == 'html': 
        return f'<tr><td colspan="{self.num_models + 1}" '
                'style="border-bottom: 1px solid black"></td></tr>'
    elif format == 'latex':
        return '\\hline \\\\[-1.8ex]\n'
    

def _gen_row(self, columns, format):
    first_col, *main_cols = columns 
    if format == 'html':
        row = '<tr>'
        row += f'<td>{first_col}</td>'
        for col in main_cols:
            row += f'<td>{col}</td>'
        row += '</tr>'
    ...
    return row

This restructuring of the code removes the above-described code repetitions. Thus, bug fixes and the implementation of new formats would require a lot fewer changes to the code.

At the same time, it preserves much of the seperation into smaller pieces of code, meaning it remains easily navigatable. Moreove, one can still implement a custom render function (not use generic _gen functions), if need be (e.g. if the formatting of one line is affected by the next).

Since the change only affects internal methods (not the render options or render_* methods), it is fully backwards compatible.

Note that the above sketch uses newer language features (f-strings) for brevity. To keep stargazer compatible with Python 3.0, these would need to be replace by .format().

@MaxGhenis
Copy link
Contributor

Agreed, your proposed solution sounds great. A utility to concatenate fields through the for md in self.model_data: structure could also reduce code duplication.

@toobaz
Copy link
Collaborator

toobaz commented Jun 4, 2020

Quite different from your proposal @csemken , but should make it easier to apply it (at least we avoid the format parameter, and it should be easier to isolate format-agnostic code parts): #46 , comments welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants