Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fueling discussion: Is it sensible to assume mysql during setup? #634

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

bernd-wechner
Copy link
Contributor

I can't assess the purpose of this in the broader context of OWA but during install, if I select postgresql (I'm experimenting with that branch on my repo) then owa_settings.__construct lands here twice. The first time with an explicit entity creation for base.configuration at line 60 in settings.php currently, and the second time on line 80 via a call to setupStorageEngine.

Moreover at the time of the first call OWA_DB_TYPE is as yet unset so it drops back onto a mysql setup (a definite no-no IMO).

There is inconsistent methodology for ensuring a single load, with entityFactory consulting entityFactory (an arbitrary variable known to be set in the database plugin file) to determine if the file's been loaded and setupStorageEngine using a require_once strategy. Again a no-no, consistency of approach is desirable.

Questions on the table:

  1. Does entityFactory need to setup the Storage engine at all?
  2. Does it need to setup a Storage engine for all entities or only some?
  3. Is it ever kosher to fall back on mysql (IMO no)? Or is it better to bail?

This is NOT a complete PR. There is BIG issue at play still in that:

  1. the constructor for owa_configuration requires database information.
  2. this information is not yet available for the selected database when it is called in the owa_settings constructor
  3. currently this falls back to mysql (the bit this PR changes), but if we don't fall back to something, the configuration constructor fails.
  4. I've tried delaying this database configuration in the configuration object but things go wrong fast. It will take more work to understand the nuance of ordering here.
  5. what is clear is the a configurable database engine revels a strange race condition here, that needs resolving methinks (in which the configuration object needs to know about the database but it it doesn't know what type of database yet.

Am hoping more experience eyes can resolve this and comment. For now this is a very short PR to fuel that thinking. If an issue is more appropriate just say, no problem there. Just figured could work on and evolve this PR to resolve the issue.

Pull request checklist

Please check if your PR fulfills the following requirements:

  • Tested the changes
  • Build (/path/to/php cli.php cmd=build) was run locally and any changes were pushed

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Other (please describe):

What is the current behavior?

Issue Number: N/A

What is the new behavior?

Does this introduce a breaking change?

  • Yes
  • No

Docs need to be updated?

  • Yes
  • No

Other information

This came to my attention because if I select a postgresql database (am experimenting with that) the installer ends up loading the mysql configuration  then the postgresql configuration. That throws a pile of non fatal errors on my console (hence is noticeable). But raises some questions.

The first load (before the db_type is known) is with an explicit entity creation for base.configuration at line 60 in settings.php, and the second load (now knowing the db_type) on line 80 via a call to setupStorageEngine.

There is inconsistent methodology for ensuring a single load as well, with entityFactory consulting OWA_DTD_INT (an arbitrary variable known to be set in the database plugin file) to determine if the file's been loaded and setupStorageEngine using a require_once strategy. A no-no, consistency of approach is desirable.

Questions on the table:

1. Does entityFactory need to setup the Storage engine at all?
2. Does it need to setup a Storage engine for all entities or only some?
3. Is it ever kosher to fall back on mysql (IMO no)? Or is it better to bail? Or can we, must we find a way to order things so that fallback is never needed?

This is NOT a complete PR. There is a BIG issue at play still in that:

1. the constructor for owa_configuration requires database information.
2. this information is not yet available for the selected database when it is called in the owa_settings constructor
3. currently this falls back to mysql (the bit this PR changes), but if we don't fall back to something, the configuration constructor fails.
4. I've tried delaying this database configuration in the configuration object but things go wrong fast. It will take more work to understand the nuance of ordering here. 
5. what is clear is the a configurable database engine revels a strange race condition here, that needs resolving methinks (in which the configuration object needs to know about the database but it it doesn't know what type of database yet. 

Am hoping more experience eyes can resolve this and comment. For now this is a very short PR to fuel that thinking. If an issue is more appropriate just say, no problem there. Just figured could work on and evolve this PR to resolve the issue.
@padams
Copy link
Collaborator

padams commented Jun 24, 2020

Setup storage engine is required because it loads the DBclass file so that the DTD constants are made available to entities.

Since the config obj is an entity the constants need to be defined early before it is created.

Removing the hardcoded MySQL-ism should be enough to get this to work as long as the new Postgres dB class has all the same constants defined as the MySQL one.

There is nothing about attempting to fall back to MySQL (or any on another DB) in the design. If that’s happening then there must be another hard coded string somewhere.

@bernd-wechner
Copy link
Contributor Author

Thanks for checking it out. I'd be curious how and why I found it falling back to mysql then. Will look deeper, but you can help with a tip if you have time (as I'm not sure how much time and when I can give it, we'll see). The point to note is that the reason for instigating this conversation is that I was seeing it fall back to mysql on that very line, when the base.configuration entity (the first) was being created. In essence because OWA_DB_TYPE was not yet defined when these lines were hit. Why that was demands further study alas. I'm just reloading the installer and usng a PHP debugger. The key question becomes: Where and when is the DTD loaded and is it reliably before the owa_settings are constructed? If it is reliably before then (which empirically was not happening) then indeed, all good, the fall back simply needs to be removed from here (could add an assertion here that OWA_DB_TYPE be define.

@bernd-wechner
Copy link
Contributor Author

OK a little more research. This way broken. And I'm a bit stuck on fixing it.

if owa_coreAPI::entityFactory contains this:

        // Must be called before any entities are created
        if (defined('OWA_DB_TYPE')) {
            if (!defined('OWA_DTD_INT')) {
                owa_coreAPI::setupStorageEngine(OWA_DB_TYPE);
            }
        } else {
            owa_coreAPI::error(sprintf('Database type is needed and not supplied.'));
        }

And I load install.php:

  1. It loads fine without issue.
  2. If I put a breakpoint on the call to owa_coreAPI::error it breaks there and I can see defined('OWA_DB_TYPE') is false. Which means:
    a.It seems owa_coreAPI::error does nothing when it's run (I was hoping it would bail with an error)
    b. OWA_DB_TYPE is not defined.

in fact in the owa_settings constructor we see at end, after this is run:

         // include storage engine class so that DTD constants get loaded
         owa_coreAPI::setupStorageEngine($this->get('base','db_type'));

The comment itself says load the DTD. But you'r saying the DTD is needed in the the entity factory. We can't have it both ways. Something is seriously broken here.

owa_coreAPI::entityFactory('base.configuration') needs OWA_DB_TYPE and this is not available until the config file is loaded.
Thanks to the fix in settings.py this can now safely assume that OWA_DB_TYPE is set.
@bernd-wechner
Copy link
Contributor Author

OK I found the reason, I fixed it. And this PR I propose is now safe and ready for merge. But I am happy to hear feedback. It turns out that OWA_DB_TYPE is set by loadConfigFile() and we can safely run loadConfigFile before we run owa_coreAPI::entityFactory('base.configuration'). Tried and tested and stuidied code to convince myself of the truth of that.

Now that OWA_DB_TYPE is guaranteed to be available by the time entityFactory is called we know that entityFactory will call setupStorageEngine if needed and so we definitely do not need to call it again at the end of owa_settings.__construct().

I suggest this is fixed now. The correct database plug in will be loaded, once and only once by the time it is needed.

There is a TODO: left in the constructor, but it's a legacy todo unrelated to this change.

@bernd-wechner
Copy link
Contributor Author

Actually just tested this without a config file and it's not clean. It runs fine, got no gripes there and it goes cleanly to the Configuration Settings page. But it does throw a lot of errors of the form:

PHP Warning: Use of undefined constant SOME_DTD_CONSTANT - assumed 'SOME_DTD_CONSTANT' (this will throw an Error in a future version of PHP)

Will look into it, but it has clearly to do with code that's running that needs a DTD (as it stands) but clearly does not in order to go to the next page titled Configuration Settings.

My conclusion: there are more bugs here. Code that is running when it can't, shouldn't etc. It has not DTD yet, so should not be running. The Configuration Settings shoudl be rendered, a config file created than then it should run.

@bernd-wechner
Copy link
Contributor Author

Of these are all legacy issues related to the fallback on a mysql DTD. Removing that reveals them. They are however worth fixing. That is, the installer should, if noticing no config, do minimal (DTD free) initialisations before presenting the Configuration Settings page, which once saved, specifies a DTD and all is back to normal.

@bernd-wechner
Copy link
Contributor Author

As these pre-config DTD accesses are a legacy error that stems from the assumption of a mysql DTD, I collated them and identified them all. They occur only in these files:

/modules/base/classes/factTable.php
/modules/base/entities/action_fact.php
/modules/base/entities/click.php
/modules/base/entities/commerce_line_item_fact.php
/modules/base/entities/commerce_transaction_fact.php
/modules/base/entities/domstream.php
/modules/base/entities/feed_request.php
/modules/base/entities/request.php
/modules/base/entities/session.php
/modules/base/entities/user.php

but number 183 in total, much greater in number if duplicate hits counted. But essentially it relates to entity creation.

The can all be made DTD safe by making the DTD accesses in each contingent upon defined('OWA_DB_TYPE').

I have tried skipping the creation of these entities altogether pre-config (when defined('OWA_DB_TYPE') is not defined) but things break fast as there are dependencies upon them somewhere that the setup splash page and config page still rely on (and shouldn't!).

It's easier (more cost effective) to fix the entities to access DTD only when available and leave those (probably unnecessary dependencies upon entities for the splash page and the config page) in place, unless I can find them quickly.

But I appreciate your learned input in direction here ... this PR as it stands is sold with a config file, but removing the config file loads of DTD access errors which are currently just warnings but nonetheless buggy happen. There are only two page views prior to a config file existing and neither of these should rely on much of the back end framework at all, let alone the DTD.

So don't. Load metrics that is. _loadMetrics relies on the database definition.
User entity is needed prior to database definition being available when rendering the Configuration Settings page. It probably shouldn't be, but the easiest way to avoid errors for now, is to make a User entity available without data type definitions.  That satisfies the Configuration Settings page.
@bernd-wechner
Copy link
Contributor Author

bernd-wechner commented Jun 27, 2020

Righto, I think I nailed it. At least I'm up to step three in the install without errors. I'll add commits to the PR for these:

  1. Prior to config being available OWA needs a config and user entity at least.
    a. these both access the DTD but we can defer typing their columns until one is available
  2. It tries to load metrics, wholly unneeded, accessing the DTD.

Added two commits covering this. Which makes this PR functional for:

  1. The case where a config file exists
  2. The case where it does not, for the splash page and the Configuration Settings page

This PR is robust and gets me to the "Default Site & User Information" page free of error and is ripe for merge into OWA main.

Pending of course code inspection and tests (tests fine my end).

@padams
Copy link
Collaborator

padams commented Jun 27, 2020

I can look into this over the weekend.

@bernd-wechner
Copy link
Contributor Author

Thanks. I'm deeply stuck on what the next page is and what it's asking for: "Default Site & User Information" I can't find any documentation on this that elucidates. What is this asking for?

@bernd-wechner
Copy link
Contributor Author

Google returns just one page on a hard search:

https://www.how2shout.com/how-to/install-open-web-analytics-owa-ubuntu-centos.html

And it is not even clear on what that page is. The URL seems to be the URL of the site we want to track. But the username and password? How does this differ from Username and password defined on the prior page "Configuration Settings "?

@padams
Copy link
Collaborator

padams commented Jun 27, 2020

ok i see the race condition you've been working through here. It goes back to an early design decision to use constants for the DTDs in the entity classes themselves. But that really isn't needed... as those constants are just an abstraction layer that is only really used when tables are created or at least long after we have a valid DB connection...

So... i think the right thing to do here is to renovate the entities to use strings instead of constants to declare their DTDs. Then we can look up or translate those string keys just before they are really needed.

@padams
Copy link
Collaborator

padams commented Jun 27, 2020

I think the look up could just be a translation of the string to the constant using php's constant function. So two things would need to happen:

  1. convert all constants in the entity classes to strings.
  2. in owa_column::getDefinition convert the DTD string to the constant like so:
function getDefinition() {

         $definition = '';

         $definition .= constant( $this->get('data_type') );

        // Check for auto increment
        if ($this->get('auto_increment') == true):
            $definition .= ' ' . constant( 'OWA_DTD_AUTO_INCREMENT' );
        endif;

        // Check for auto Not null
        if ($this->get('is_not_null') == true):
            $definition .= ' ' . constant( 'OWA_DTD_NOT_NULL' );
        endif;

        // Check for unique
        if ($this->get('is_unique') == true):
            $definition .= ' ' . constant( 'OWA_DTD_UNIQUE' );
        endif;

        // check for primary key
        if ($this->get('is_primary_key') == true):
            $definition .= ' ' . constant( 'OWA_DTD_PRIMARY_KEY' );
            //$definition .= sprintf(", INDEX (%s)", $this->get('name'));
        endif;

        // check for index
        if ($this->get('index') == true):
            $definition .= sprintf(", INDEX (%s)", $this->get('name'));
        endif;

         return $definition;

     }

there might be a few more places in owa_db class that would be needed as well.

@bernd-wechner
Copy link
Contributor Author

Sounds great. Just need to decide on directions to proceed. I have postgresql working here:

https://github.com/bernd-wechner/Open-Web-Analytics/tree/Postgresql

so any such change (to defer access to the DTD until it's know - which is very sensible) can either go into master here, and I merge it back into my branch or I just build it into the postgresql branch above and PR the whole bundle after I'm content. I'm install content as is, but want to move it to a production environment for a bit to see if it works fine all round.

The main uncertainty I have for now is the mapping of BLOB to TEXT which I needed or the configuration.settings (postgresql suggests mapping BLOB to BYTEA or TEXT, but BYTEA is not read back in the way MySQL reads BLOB back and the settings don't unserialize. TEXT works without changing any of the reading code. But the question for testing remains, whether TEXT works in all the contexts OWA uses BOB, which are:

configuration.settings
site.settings
session.latest_attributions
queue_item.event
domstream.events

The first two or three are likely to be fine, the last two less confident. Pending tests.

@padams
Copy link
Collaborator

padams commented Jun 28, 2020

I'll make these core changes in a branch that you can pull from.

padams pushed a commit that referenced this pull request Jun 28, 2020
…ype string to owa_column and owa_db. supports #634
@padams
Copy link
Collaborator

padams commented Jun 28, 2020

Ok. commit just landed in a new branch called pg_support. Start pulling from there and let me know if you need other changes.

@bernd-wechner
Copy link
Contributor Author

I'll take a look soon. Thanks. Did a bit of a sprint getting it this far (steep learning curve on a few fronts). Looking forward to trying it out live. A few things to sort before I can roll it out.

@padams
Copy link
Collaborator

padams commented Jan 1, 2023

Picking this back up... not sure we need this PR any more. The fix for the mysql fallback race condition turned out to be a lot easier to fix than I thought. It's in master now. If you want to pick this back up and land your PG plugin then rebase with master for the changes. There area also a number of other changes in there to the installer to ensure that we do not try to load the config from the db until the actual install is done.

@padams padams added the hasfix a fix has been applied to master. label Jan 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hasfix a fix has been applied to master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants