Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating docs with GPT-4 #2064

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Generating docs with GPT-4 #2064

wants to merge 1 commit into from

Conversation

ekiefl
Copy link
Contributor

@ekiefl ekiefl commented Apr 10, 2023

Hey @merenlab/all I thought this was quite interesting and thought I would share.

I've been exploring use cases for using chat-gpt and one thing I wanted to explore was how well it could generate docstrings. Spoiler alert: very accurately. This isn't a serious PR (I didn't even finish the db.py module), but I think the results are quite impressive.

I've posted the transcript of my conversation with Chat GPT-4 that has led to these docstrings.

@meren
Copy link
Member

meren commented Apr 10, 2023

holy smokes :) what a great idea!

@meren
Copy link
Member

meren commented Apr 10, 2023

How much manual curation is involved here for this commit?

@meren
Copy link
Member

meren commented Apr 10, 2023

(asking because I couldn't find where you posted the transcript of your convo (am I being thick? is this literally a copy-paste from the prompt into db.py?))

@ekiefl
Copy link
Contributor Author

ekiefl commented Apr 10, 2023

Lol you're right. Whoops, I meant to post it in this convo. Here it is:

Prompt:

I would like you to help me write some docstrings for a module called `db.py` that belongs in anvi'o. I'm sure you already know what anvi'o is. Anvi'o holds user data in dynamically managed SQLite3 databases, and `db.py` holds the class `DB`, which is the workhorse for all interfacing between anvi'o databases and python. I would like you to generate/improve the docstrings in `DB`. Please use Google style-guide formatting. Don't describe the type of arguments/attributes. Let's do a test drive with the following function. Please generate a docstring: ```python def _display_db_calls(func): def inner(self, *args, **kwargs): if self.read_only: raise ConfigError(f"Cannot call `DB.{func.__name__}` in read-only instance") else: return func(self, *args, **kwargs) return inner ```

Response:

def _display_db_calls(func):
    """
    A decorator to ensure that a database function can only be called when the `DB` instance is not read-only.
    
    This decorator wraps a given function and checks whether the `DB` instance is read-only before allowing the
    function to be called. If the `DB` instance is read-only, a `ConfigError` is raised with a message
    indicating that the specific function cannot be called on a read-only instance.

    Args:
        func (function): The function to be wrapped by the decorator.

    Returns:
        function: The wrapped function that checks for read-only instances before executing.

    Raises:
        ConfigError: If the `DB` instance is read-only and the wrapped function is called.
    """
    def inner(self, *args, **kwargs):
        if self.read_only:
            raise ConfigError(f"Cannot call `DB.{func.__name__}` in read-only instance")
        else:
            return func(self, *args, **kwargs)
    return inner

Prompt:

Good start. That is the perfect amount of description. Moving forward, please adhere to the following amendments: 1. The header should be on the same line as the triple quotes 2. Subsequent lines of a multi-line description should be indented with 4 spaces. 3. Whatever caused you to add ` (function)` after `func`, please cut that out. Ok with our procedure now described, I'm going to pass you another example: ```python def get_list_in_chunks(input_list, num_items_in_each_chunk=5000): """Yield smaller bits of a list""" for index in range(0, len(input_list), num_items_in_each_chunk): yield input_list[index:index + num_items_in_each_chunk] ```

Response:

def get_list_in_chunks(input_list, num_items_in_each_chunk=5000):
    """Yield smaller bits of a list.

    This function takes an input list and breaks it into smaller chunks, yielding each chunk one at a time.
    The size of the chunks can be controlled by the `num_items_in_each_chunk` parameter.

    Args:
        input_list (list): The input list to be divided into smaller chunks.
        num_items_in_each_chunk (int, optional): The number of items in each chunk. Defaults to 5000.

    Yields:
        list: A chunk of the input list with size `num_items_in_each_chunk`.
    """
    for index in range(0, len(input_list), num_items_in_each_chunk):
        yield input_list[index:index + num_items_in_each_chunk]

Prompt:

This is good, but I noticed you're adding the type of each argument. While that is the google standard, this codebase is untyped, so I don't want you trying to infer types. Let's amend the procedure with a fourth citerion: 4. Please omit types Ok, one last try here:

Response:

I apologize for the oversight. Here is the revised docstring without the inferred types:

def get_list_in_chunks(input_list, num_items_in_each_chunk=5000):
    """Yield smaller bits of a list.

    This function takes an input list and breaks it into smaller chunks, yielding each chunk one at a time.
    The size of the chunks can be controlled by the `num_items_in_each_chunk` parameter.

    Args:
        input_list: The input list to be divided into smaller chunks.
        num_items_in_each_chunk (optional): The number of items in each chunk. Defaults to 5000.

    Yields:
        A chunk of the input list with size `num_items_in_each_chunk`.
    """
    for index in range(0, len(input_list), num_items_in_each_chunk):
        yield input_list[index:index + num_items_in_each_chunk]

Prompt:

Looks great to me. To save tokens, let's add a final criterion to the procedure: 5. Refrain from writing the method body. Instead, just put `...`. Now, I'm going to feed you the `DB` class and its methods in chunks, for my own convenience. Please add/improve docstrings for thist first chunk of the class: ```python class DB: def __init__(self, db_path, client_version, new_database=False, ignore_version=False, read_only=False, skip_rowid_prepend=False, run=terminal.Run(), progress=terminal.Progress()): self.db_path = db_path self.read_only = read_only self.version = None self.run = run self.progress = progress self.tables_to_exclude_from_db_call_reports = ['self', 'sqlite_master'] # these anonymous functions report whether the ROWID will be added # to its rows read from the database or not. if the first column of a given # table does not contain unique variables, anvi'o prepends the ROWID of each # column to index 0, unless `skip_rowid_prepend` is True self.ROWID_PREPENDS_ROW_DATA = lambda table_name: False if skip_rowid_prepend else tables.is_table_requires_unique_entry_id(table_name) self.PROPER_SELECT_STATEMENT = lambda table_name: 'ROWID as "entry_id", *' if self.ROWID_PREPENDS_ROW_DATA(table_name) else '*' if new_database: filesnpaths.is_output_file_writable(db_path) else: filesnpaths.is_file_exists(db_path) if new_database and os.path.exists(self.db_path): os.remove(self.db_path) if self.read_only and new_database: raise ConfigError("One cannot create a new database that is read-only.") if not self.read_only: self.check_if_db_writable() try: self.conn = sqlite3.connect(self.db_path) except Exception as e: raise ConfigError(f"This one time someone was not happy with '{self.db_path}' and '{e}', they said.") self.conn.text_factory = str self.cursor = self.conn.cursor() self.table_names_in_db = self.get_table_names() self.db_connected = True if new_database: self.create_self() self.set_version(client_version) else: self.version = self.get_version() if str(self.version) != str(client_version) and not ignore_version: if int(self.version) > int(client_version): progress.reset() raise ConfigError("Bad news of the day: the database at %s was generated with an anvi'o version that is 'newer' than " "the one you are actively using right now. We know, you hate to hear this, but you need to upgrade " "your anvi'o :(" % self.db_path) else: progress.reset() raise ConfigError(f"The database at '{self.db_path}' is outdated (this database is v{self.version} and your anvi'o installation " f"wants to work with v{client_version}). You can migrate your database without losing any data using the " f"program `anvi-migrate` with either of the flags `--migrate-safely` or `--migrate-quickly`.") bad_tables = [table_name for table_name in self.table_names_in_db if not tables.is_known_table(table_name)] if len(bad_tables): raise ConfigError("You better be a programmer tinkering with anvi'o databases adding new tables or something. Otherwise we " "have quite a serious problem :/ Each table in a given anvi'o database must have an entry in the " "anvio/tables/__init__.py dictionary `table_requires_unique_entry_id` to explicitly define whether anvi'o " "should add a unique entry id for its contents upon retrieval as a dictionary. The following tables " "in this database do not satisfy that: %s." % (', '.join([f"'{t}'" for t in bad_tables]))) def __enter__(self): return self def __exit__(self, *args): self.disconnect() def _display_db_calls(func): def inner(self, *args, **kwargs): if self.read_only: raise ConfigError(f"Cannot call `DB.{func.__name__}` in read-only instance") else: return func(self, *args, **kwargs) return inner def _not_if_read_only(func): def inner(self, *args, **kwargs): if self.read_only: raise ConfigError(f"Cannot call `DB.{func.__name__}` in read-only instance") else: return func(self, *args, **kwargs) return inner ```

Response:

class DB:
    """Anvi'o SQLite3 database management class.

    This class provides an interface for working with anvi'o SQLite3 databases. It allows for
    creating and managing tables, running queries, and interacting with the data in the database.
    """

    def __init__(self, db_path, client_version, new_database=False, ignore_version=False, read_only=False, skip_rowid_prepend=False,
                 run=terminal.Run(), progress=terminal.Progress()):
        """Initialize the DB instance.

        Args:
            db_path: The path to the SQLite3 database file.
            client_version: The anvi'o client version associated with the database.
            new_database (optional): Whether to create a new database. Defaults to False.
            ignore_version (optional): Whether to ignore version checking. Defaults to False.
            read_only (optional): Whether the database should be opened in read-only mode. Defaults to False.
            skip_rowid_prepend (optional): Whether to skip prepending ROWID to rows in some tables. Defaults to False.
            run (optional): A terminal.Run() instance. Defaults to a new terminal.Run() instance.
            progress (optional): A terminal.Progress() instance. Defaults to a new terminal.Progress() instance.
        """
        ...

    def __enter__(self):
        """Allow the DB instance to be used in a 'with' statement."""
        ...

    def __exit__(self, *args):
        """Clean up the DB instance upon exiting a 'with' statement."""
        ...

    def _display_db_calls(func):
        """A decorator to ensure that a database function can only be called when the DB instance is not read-only.

        This decorator wraps a given function and checks whether the DB instance is read-only before allowing the
        function to be called. If the DB instance is read-only, a ConfigError is raised with a message
        indicating that the specific function cannot be called on a read-only instance.

        Args:
            func: The function to be wrapped by the decorator.

        Returns:
            The wrapped function that checks for read-only instances before executing.

        Raises:
            ConfigError: If the DB instance is read-only and the wrapped function is called.
        """
        ...

    def _not_if_read_only(func):
        """A decorator to ensure that a database function can only be called when the DB instance is not read-only.

        This decorator is an alternative to _display_db_calls with the same functionality.

        Args:
            func: The function to be wrapped by the decorator.

        Returns:
            The wrapped function that checks for read-only instances before executing.

        Raises:
            ConfigError: If the DB instance is read-only and the wrapped function is called.
        """
        ...

Prompt:

That's great. Feel free to describe what the method does more verbosely if the method logic is not straight forward. Moving on: ```python def get_version(self): try: return self.get_meta_value('version') except: raise ConfigError("%s does not seem to be a database generated by anvi'o :/" % self.db_path) def check_if_db_writable(self): check_counter = 0 check_interval = 1 # in seconds check_limit = 300 # 5 minutes, in seconds journal_path = self.db_path + '-journal' while(check_counter < check_limit and filesnpaths.is_file_exists(journal_path, dont_raise=True)): if check_counter == 0: # print only once self.run.info_single("It seems the database at '%s' currently used by another process " "for writing operations. Anvi'o refuses to work with this database to avoid corrupting it. " "If you think this is a mistake, you may stop this process and delete the lock file at '%s' after making sure " "no other active process is using it for writing. In case this program is run by an automatic workflow manager like snakemake " "Anvi'o will periodically check if the journal file still exists for total of %d minutes. If the database is still not writable " "after that time, Anvi'o will stop running. " % (os.path.abspath(self.db_path), os.path.abspath(journal_path), int(check_limit/60))) time.sleep(check_interval) check_counter += check_interval if not check_counter < check_limit: raise ConfigError("Database is not writable.") @_not_if_read_only def create_self(self): self._exec('''CREATE TABLE self (key text, value text)''') @_not_if_read_only def drop_table(self, table_name): """Delete a table in the database if it exists""" self._exec('''DROP TABLE IF EXISTS %s;''' % table_name) @_not_if_read_only def create_table(self, table_name, fields, types): if len(fields) != len(types): raise ConfigError("create_table: The number of fields and types has to match.") db_fields = ', '.join(['%s %s' % (t[0], t[1]) for t in zip(fields, types)]) self._exec('''CREATE TABLE %s (%s)''' % (table_name, db_fields)) self.commit() self.table_names_in_db = self.get_table_names() @_not_if_read_only def set_version(self, version): self.set_meta_value('version', version) self.commit() @_not_if_read_only def set_meta_value(self, key, value): self.remove_meta_key_value_pair(key) self._exec('''INSERT INTO self VALUES(?,?)''', (key, value,)) self.commit() @_not_if_read_only def remove_meta_key_value_pair(self, key): self._exec('''DELETE FROM self WHERE key="%s"''' % key) self.commit() ```

Response:

    def get_version(self):
        """Get the anvi'o version associated with the database.

        Returns:
            The version of the anvi'o client that created the database.

        Raises:
            ConfigError: If the database doesn't appear to be generated by anvi'o.
        """
        ...

    def check_if_db_writable(self):
        """Check if the database is writable.

        This function checks if the database is currently locked by another process for writing operations.
        If it is locked, the function will wait for a maximum of 5 minutes (configurable) before raising
        a ConfigError. During this time, it will periodically check if the lock has been released.

        Raises:
            ConfigError: If the database is not writable after the maximum waiting time.
        """
        ...

    def create_self(self):
        """Create the 'self' table in the database.

        The 'self' table is a meta table used to store key-value pairs related to the database.
        """
        ...

    def drop_table(self, table_name):
        """Delete a table in the database if it exists.

        Args:
            table_name: The name of the table to be dropped.
        """
        ...

    def create_table(self, table_name, fields, types):
        """Create a new table in the database with the specified fields and types.

        Args:
            table_name: The name of the new table to be created.
            fields: A list of field names for the new table.
            types: A list of data types corresponding to the field names.

        Raises:
            ConfigError: If the number of fields and types do not match.
        """
        ...

    def set_version(self, version):
        """Set the anvi'o version associated with the database.

        Args:
            version: The version of the anvi'o client to be associated with the database.
        """
        ...

    def set_meta_value(self, key, value):
        """Set a meta key-value pair in the 'self' table.

        If the key already exists in the table, the existing key-value pair will be removed
        before the new one is added.

        Args:
            key: The meta key to be set.
            value: The meta value associated with the key.
        """
        ...

    def remove_meta_key_value_pair(self, key):
        """Remove a meta key-value pair from the 'self' table.

        Args:
            key: The meta key to be removed.
        """
        ...

Prompt:

```python @_not_if_read_only def update_meta_value(self, key, value): self.remove_meta_key_value_pair(key) self.set_meta_value(key, value) @_not_if_read_only def copy_paste(self, table_name, source_db_path, append=False): """Copy `table_name` data from another database (`source_db_path`) into yourself Arguments ========= append : bool, False If True, the table is appened to the source DB, rather than replaced. """ source_db = DB(source_db_path, None, ignore_version=True) num_entries_in_source = source_db.get_row_counts_from_table(table_name) if not num_entries_in_source: return # we are done with the source DB python object. The rest we do in SQL # for huge performance gains source_db.disconnect() if not append: self._exec('''DELETE FROM %s''' % table_name) self._exec('''ATTACH "%s" AS source_db''' % source_db_path) self._exec('''INSERT INTO main.%s SELECT * FROM source_db.%s''' % (table_name, table_name)) self._exec('''DETACH DATABASE "source_db"''') def _fetchall(self, response, table_name): """Wrapper for fetchall""" DISPLAY_DB_CALLS = False if table_name in self.tables_to_exclude_from_db_call_reports else anvio.DISPLAY_DB_CALLS if DISPLAY_DB_CALLS: sql_exec_timer = terminal.Timer() results = response.fetchall() if DISPLAY_DB_CALLS: self.run.info("fetchall", f"{sql_exec_timer.time_elapsed()}", mc='yellow', nl_after=1) return results def get_max_value_in_column(self, table_name, column_name, value_if_empty=None, return_min_instead=False): """Get the maximum OR minimum column value in a table Parameters ========== value_if_empty : object, None If not None and table has no entries, value returned is value_if_empty. """ response = self._exec("""SELECT %s(%s) FROM %s""" % ('MIN' if return_min_instead else 'MAX', column_name, table_name)) rows = self._fetchall(response, table_name) val = rows[0][0] if isinstance(val, type(None)): return value_if_empty try: val = int(val) except ValueError: pass return val ```

Response:

    def update_meta_value(self, key, value):
        """Update a meta key-value pair in the 'self' table.

        This function first removes the existing key-value pair (if any) and then sets the new
        key-value pair.

        Args:
            key: The meta key to be updated.
            value: The new meta value associated with the key.
        """
        ...

    def copy_paste(self, table_name, source_db_path, append=False):
        """Copy data from a table in another database into the current database.

        Args:
            table_name: The name of the table to copy data from.
            source_db_path: The path of the source database.
            append: If True, the table is appended to the source DB, rather than replaced. (default: False)
        """
        ...

    def _fetchall(self, response, table_name):
        """Wrapper for fetchall method on a response object from a database query.

        Args:
            response: A response object from a database query.
            table_name: The name of the table being queried.

        Returns:
            A list of tuples with the fetched data.
        """
        ...

    def get_max_value_in_column(self, table_name, column_name, value_if_empty=None, return_min_instead=False):
        """Get the maximum or minimum value in a specified column of a table.

        Args:
            table_name: The name of the table to query.
            column_name: The name of the column to find the maximum or minimum value.
            value_if_empty: If not None and table has no entries, the value returned is value_if_empty. (default: None)
            return_min_instead: If True, returns the minimum value instead of the maximum value. (default: False)

        Returns:
            The maximum or minimum value in the specified column, or value_if_empty if the table is empty.
        """
        ...

Prompt:

```python def get_meta_value(self, key, try_as_type_int=True, return_none_if_not_in_table=False): """if try_as_type_int, value is attempted to be converted to integer. If it fails, no harm no foul.""" response = self._exec("""SELECT value FROM self WHERE key='%s'""" % key) rows = self._fetchall(response, 'self') if not rows and return_none_if_not_in_table: return None if not rows: raise ConfigError("A value for '%s' does not seem to be set in table 'self'." % key) val = rows[0][0] if isinstance(val, type(None)): return None if try_as_type_int: try: val = int(val) except ValueError: pass return val def commit(self): self.conn.commit() def disconnect(self): if self.db_connected: self.conn.commit() self.conn.close() self.db_connected = False else: # it is already disconnected pass def _exec(self, sql_query, value=None): """Execute an arbitrary sql statement Notes ===== - This is a private method, and so it is presumed whoever uses it knows what they are doing. For this reason, it is not decorated with _not_if_read_only. It is therefore possible to write to the DB using this method, even with self.read_only = True """ # this is an ugly workaround to not display DB calls if they involve talbes # such as `self` or `sqlite_master` (comlete list in self.tables_to_exclude_from_db_call_reports). # Otherwise when the user sets the `--display-db-calls` flag, the output is heavily # dominated by queries to `self`. Even though Meren is implementing this sadness, # Iva agreed to it as well. Just saying: if any(table_name for table_name in self.tables_to_exclude_from_db_call_reports if f' {table_name} ' in sql_query): DISPLAY_DB_CALLS = False else: DISPLAY_DB_CALLS = anvio.DISPLAY_DB_CALLS if DISPLAY_DB_CALLS: self.progress.reset() self.run.warning(None, header='EXECUTING SQL', lc='yellow', nl_before=1) self.run.info_single(f"{os.path.abspath(self.db_path)}", cut_after=None, level=0, mc='yellow', nl_after=1) self.run.info_single(f"{sql_query}", cut_after=None, level=0, mc='yellow', nl_after=1) sql_exec_timer = terminal.Timer() if value: ret_val = self.cursor.execute(sql_query, value) else: ret_val = self.cursor.execute(sql_query) if DISPLAY_DB_CALLS: self.run.info("exec", f"{sql_exec_timer.time_elapsed()}", mc='yellow') self.commit() return ret_val def _exec_many(self, sql_query, values): """Execute many sql statements Notes ===== - This is a private method, and so it is presumed whoever uses it knows what they are doing. For this reason, it is not decorated with _not_if_read_only. It is therefore possible to write to the DB using this method, even with self.read_only = True """ chunk_counter = 0 for chunk in get_list_in_chunks(values): if anvio.DISPLAY_DB_CALLS: header = f"MULTI SQL // {chunk_counter} of {len(values)} with {len(chunk)} {len(chunk)} entries" self.run.warning(None, header=header, progress=self.progress, lc='yellow') self.run.info_single(f"{sql_query}", nl_after=1, cut_after=None, level=0, mc='yellow') sql_exec_timer = terminal.Timer() self.cursor.executemany(sql_query, chunk) if anvio.DISPLAY_DB_CALLS: self.run.info("exec", f"{sql_exec_timer.time_elapsed()}", mc='yellow') chunk_counter += 1 return True ```

Response:

    def get_meta_value(self, key, try_as_type_int=True, return_none_if_not_in_table=False):
        """Get the value associated with a key from the 'self' table.

        Args:
            key: The meta key to search for.
            try_as_type_int: If True, attempts to convert the value to an integer. If it fails, returns the original value. (default: True)
            return_none_if_not_in_table: If True, returns None if the key is not found in the table. (default: False)

        Returns:
            The value associated with the key, or None if the key is not found and return_none_if_not_in_table is True.
        """
        ...

    def commit(self):
        """Commit any pending transactions to the database."""
        ...

    def disconnect(self):
        """Disconnect from the database, committing any pending transactions and closing the connection."""
        ...

    def _exec(self, sql_query, value=None):
        """Execute an arbitrary SQL statement.

        Note: This is a private method and it is assumed that whoever uses it knows what they are doing.
        It is not decorated with _not_if_read_only, so it is possible to write to the DB using this method,
        even with self.read_only = True.

        Args:
            sql_query: The SQL query to execute.
            value: A single parameter to use in the SQL query (optional).

        Returns:
            The result of the executed SQL query.
        """
        ...

    def _exec_many(self, sql_query, values):
        """Execute many SQL statements.

        Note: This is a private method and it is assumed that whoever uses it knows what they are doing.
        It is not decorated with _not_if_read_only, so it is possible to write to the DB using this method,
        even with self.read_only = True.

        Args:
            sql_query: The SQL query to execute.
            values: A list of values to be used as parameters in the SQL query.

        Returns:
            True if all SQL statements were executed successfully.
        """
        ...

Prompt:

```python @_not_if_read_only def insert(self, table_name, values=()): query = '''INSERT INTO %s VALUES (%s)''' % (table_name, ','.join(['?'] * len(values))) return self._exec(query, values) @_not_if_read_only def insert_many(self, table_name, entries=None): if len(entries): query = '''INSERT INTO %s VALUES (%s)''' % (table_name, ','.join(['?'] * len(entries[0]))) return self._exec_many(query, entries) @_not_if_read_only def insert_rows_from_dataframe(self, table_name, dataframe, raise_if_no_columns=True): """Insert rows from a dataframe Parameters ========== raise_if_no_columns : bool, True If True, if dataframe has no columns (e.g. dataframe = pd.DataFrame({})), this function returns without raising error. Notes ===== - This should one day be replaced with the following code: if 'entry_id' in structure: # This table has an entry_id of, we have to be aware of it if 'entry_id' in df.columns: # The user already has an 'entry_id' column. We assume they know what they are doing next_available_id = df['entry_id'].max() + 1 else: num_entries = df.shape[0] next_available_id = self.get_max_value_in_column(name, 'entry_id', value_if_empty=-1) + 1 df['entry_id'] = range(next_available_id, next_available_id + num_entries) next_available_id += num_entries else: next_available_id = None # subset columns and reorder according to the table structure df = df[structure] dtypes = dict(zip(structure, types)) df.to_sql( name, self.conn, if_exists='append', chunksize=chunksize, dtype=dtypes, index=False ) return next_available_id """ self.is_table_exists(table_name) if not list(dataframe.columns) and not raise_if_no_columns: # if the dataframe has no colums, we just return return if len(set(dataframe.columns)) != len(list(dataframe.columns)): raise ConfigError("insert_rows_from_dataframe :: There is at least one duplicate column " "name in the dataframe. Here is the list of columns: [{}].".\ format(", ".join(list(dataframe.columns)))) if set(dataframe.columns) != set(self.get_table_structure(table_name)): raise ConfigError("insert_rows_from_dataframe :: The columns in the dataframe " "do not equal the columns of the requested table. " "The columns from each are respectively ({}); and ({}).".\ format(", ".join(list(dataframe.columns)), ", ".join(self.get_table_structure(table_name)))) # conform to the column order of the table structure dataframe = dataframe[self.get_table_structure(table_name)] entries = [tuple(row) for row in dataframe.values] self.insert_many(table_name, entries=entries) def is_table_exists(self, table_name): if table_name not in self.table_names_in_db: raise ConfigError(f"The database at {self.db_path} does not seem to have a table named `{table_name}` :/ " f"Here is a list of table names this database knows: {', '.join(self.table_names_in_db)}") ```

Response:

    @_not_if_read_only
    def insert(self, table_name, values=()):
        """Insert a new row into the specified table.

        Args:
            table_name: The name of the table to insert the row into.
            values: A tuple of values to insert as a new row.

        Returns:
            The result of the executed SQL query.
        """
        ...

    @_not_if_read_only
    def insert_many(self, table_name, entries=None):
        """Insert multiple rows into the specified table.

        Args:
            table_name: The name of the table to insert the rows into.
            entries: A list of tuples, where each tuple contains the values for a row.

        Returns:
            The result of the executed SQL query.
        """
        ...

    @_not_if_read_only
    def insert_rows_from_dataframe(self, table_name, dataframe, raise_if_no_columns=True):
        """Insert rows from a dataframe into the specified table.

        Args:
            table_name: The name of the table to insert the rows into.
            dataframe: A pandas DataFrame containing the rows to insert.
            raise_if_no_columns: If True, raises an error if the dataframe has no columns. (default: True)

        Raises:
            ConfigError: If the columns in the dataframe do not match the columns in the table.
        """
        ...

    def is_table_exists(self, table_name):
        """Check if a table with the given name exists in the database.

        Args:
            table_name: The name of the table to check for existence.

        Raises:
            ConfigError: If the table does not exist in the database.
        """
        ...

Prompt:

There is an important note in the original `insert_rows_from_dataframe`. Could you please include that? Don't rewrite the whole chunk, just that method docstring.

Response:

    @_not_if_read_only
    def insert_rows_from_dataframe(self, table_name, dataframe, raise_if_no_columns=True):
        """Insert rows from a dataframe into the specified table.

        Args:
            table_name: The name of the table to insert the rows into.
            dataframe: A pandas DataFrame containing the rows to insert.
            raise_if_no_columns: If True, raises an error if the dataframe has no columns. (default: True)

        Raises:
            ConfigError: If the columns in the dataframe do not match the columns in the table.

        Notes:
        =====
        - This should one day be replaced with the following code:
            if 'entry_id' in structure:
                # This table has an entry_id of, we have to be aware of it
                if 'entry_id' in df.columns:
                    # The user already has an 'entry_id' column. We assume they know what they are doing
                    next_available_id = df['entry_id'].max() + 1
                else:
                    num_entries = df.shape[0]
                    next_available_id = self.get_max_value_in_column(name, 'entry_id', value_if_empty=-1) + 1
                    df['entry_id'] = range(next_available_id, next_available_id + num_entries)
                    next_available_id += num_entries
            else:
                next_available_id = None

            # subset columns and reorder according to the table structure
            df = df[structure]

            dtypes = dict(zip(structure, types))

            df.to_sql(
                name,
                self.conn,
                if_exists='append',
                chunksize=chunksize,
                dtype=dtypes,
                index=False
            )

            return next_available_id
        """
        ...

@meren
Copy link
Member

meren commented Apr 10, 2023

image

@ekiefl
Copy link
Contributor Author

ekiefl commented Apr 10, 2023

It's annoying that the user prompt isn't formatted, but hopefully you get the idea. I'm just copy pasting the class 5-6 methods at a time.

To answer your question, I didn't manually curate anything. What you see is what you get. It seems about as accurate or more accurate than the average docstring, which all too often are stale.

@meren
Copy link
Member

meren commented Apr 10, 2023

It is absolutely very useful. Thank you very much for sending your prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants