Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace error on serialized data #105

Open
TimSmith opened this issue Mar 7, 2024 · 9 comments
Open

Replace error on serialized data #105

TimSmith opened this issue Mar 7, 2024 · 9 comments

Comments

@TimSmith
Copy link

TimSmith commented Mar 7, 2024

If you run a search on serialised data decimals can be changed. There's some odd rounding going on.

e.g. in version 1.4.5 with PHP 7.4.33 search for 'notinthedata' with nothing in the replacement field and despite the search string not being present it will want to change this existing data

a:2:{s:5:"stats";a:8:{s:7:"percent";d:54.26483359085116;s:5:"bytes";i:433035;s:11:"size_before";i:798003;s:10:"size_after";i:364968;s:4:"time";d:4.26;s:11:"api_version";s:3:"1.0";s:5:"lossy";b:1;s:9:"keep_exif";i:0;}s:5:"sizes";a:12:{s:6:"medium";O:8:"stdClass":5:{s:7:"percent";d:13.66;s:5:"bytes";i:1806;s:11:"size_before";i:13217;s:10:"size_after";i:11411;s:4:"time";d:0.3;}s:9:"thumbnail";O:8:"stdClass":5:{s:7:"percent";d:15.74;s:5:"bytes";i:948;s:11:"size_before";i:6023;s:10:"size_after";i:5075;s:4:"time";d:0.03;}s:8:"us_600_0";O:8:"stdClass":5:{s:7:"percent";d:14.03;s:5:"bytes";i:6062;s:11:"size_before";i:43216;s:10:"size_after";i:37154;s:4:"time";d:0.27;}s:15:"us_600_400_crop";O:8:"stdClass":5:{s:7:"percent";d:0.42;s:5:"bytes";i:155;s:11:"size_before";i:37154;s:10:"size_after";i:36999;s:4:"time";d:0.82;}s:8:"us_768_0";O:8:"stdClass":5:{s:7:"percent";d:14.66;s:5:"bytes";i:9818;s:11:"size_before";i:66963;s:10:"size_after";i:57145;s:4:"time";d:0.38;}s:21:"woocommerce_thumbnail";O:8:"stdClass":5:{s:7:"percent";d:13.64;s:5:"bytes";i:2534;s:11:"size_before";i:18582;s:10:"size_after";i:16048;s:4:"time";d:0.05;}s:18:"woocommerce_single";O:8:"stdClass":5:{s:7:"percent";d:0.31;s:5:"bytes";i:116;s:11:"size_before";i:36999;s:10:"size_after";i:36883;s:4:"time";d:0.16;}s:29:"woocommerce_gallery_thumbnail";O:8:"stdClass":5:{s:7:"percent";d:0.71;s:5:"bytes";i:36;s:11:"size_before";i:5075;s:10:"size_after";i:5039;s:4:"time";d:0.04;}s:12:"shop_catalog";O:8:"stdClass":5:{s:7:"percent";d:0.68;s:5:"bytes";i:109;s:11:"size_before";i:16048;s:10:"size_after";i:15939;s:4:"time";d:0.14;}s:11:"shop_single";O:8:"stdClass":5:{s:7:"percent";d:0.15;s:5:"bytes";i:56;s:11:"size_before";i:36883;s:10:"size_after";i:36827;s:4:"time";d:0.12;}s:14:"shop_thumbnail";O:8:"stdClass":5:{s:7:"percent";d:0.4;s:5:"bytes";i:20;s:11:"size_before";i:5039;s:10:"size_after";i:5019;s:4:"time";d:0.02;}s:4:"full";O:8:"stdClass":5:{s:7:"percent";d:80.22;s:5:"bytes";i:411375;s:11:"size_before";i:512804;s:10:"size_after";i:101429;s:4:"time";d:1.93;}}}

to this replacement

a:2:{s:5:"stats";a:8:{s:7:"percent";d:54.264833590851161;s:5:"bytes";i:433035;s:11:"size_before";i:798003;s:10:"size_after";i:364968;s:4:"time";d:4.2599999999999998;s:11:"api_version";s:3:"1.0";s:5:"lossy";b:1;s:9:"keep_exif";i:0;}s:5:"sizes";a:12:{s:6:"medium";O:8:"stdClass":5:{s:7:"percent";d:13.66;s:5:"bytes";i:1806;s:11:"size_before";i:13217;s:10:"size_after";i:11411;s:4:"time";d:0.29999999999999999;}s:9:"thumbnail";O:8:"stdClass":5:{s:7:"percent";d:15.74;s:5:"bytes";i:948;s:11:"size_before";i:6023;s:10:"size_after";i:5075;s:4:"time";d:0.029999999999999999;}s:8:"us_600_0";O:8:"stdClass":5:{s:7:"percent";d:14.029999999999999;s:5:"bytes";i:6062;s:11:"size_before";i:43216;s:10:"size_after";i:37154;s:4:"time";d:0.27000000000000002;}s:15:"us_600_400_crop";O:8:"stdClass":5:{s:7:"percent";d:0.41999999999999998;s:5:"bytes";i:155;s:11:"size_before";i:37154;s:10:"size_after";i:36999;s:4:"time";d:0.81999999999999995;}s:8:"us_768_0";O:8:"stdClass":5:{s:7:"percent";d:14.66;s:5:"bytes";i:9818;s:11:"size_before";i:66963;s:10:"size_after";i:57145;s:4:"time";d:0.38;}s:21:"woocommerce_thumbnail";O:8:"stdClass":5:{s:7:"percent";d:13.640000000000001;s:5:"bytes";i:2534;s:11:"size_before";i:18582;s:10:"size_after";i:16048;s:4:"time";d:0.050000000000000003;}s:18:"woocommerce_single";O:8:"stdClass":5:{s:7:"percent";d:0.31;s:5:"bytes";i:116;s:11:"size_before";i:36999;s:10:"size_after";i:36883;s:4:"time";d:0.16;}s:29:"woocommerce_gallery_thumbnail";O:8:"stdClass":5:{s:7:"percent";d:0.70999999999999996;s:5:"bytes";i:36;s:11:"size_before";i:5075;s:10:"size_after";i:5039;s:4:"time";d:0.040000000000000001;}s:12:"shop_catalog";O:8:"stdClass":5:{s:7:"percent";d:0.68000000000000005;s:5:"bytes";i:109;s:11:"size_before";i:16048;s:10:"size_after";i:15939;s:4:"time";d:0.14000000000000001;}s:11:"shop_single";O:8:"stdClass":5:{s:7:"percent";d:0.14999999999999999;s:5:"bytes";i:56;s:11:"size_before";i:36883;s:10:"size_after";i:36827;s:4:"time";d:0.12;}s:14:"shop_thumbnail";O:8:"stdClass":5:{s:7:"percent";d:0.40000000000000002;s:5:"bytes";i:20;s:11:"size_before";i:5039;s:10:"size_after";i:5019;s:4:"time";d:0.02;}s:4:"full";O:8:"stdClass":5:{s:7:"percent";d:80.219999999999999;s:5:"bytes";i:411375;s:11:"size_before";i:512804;s:10:"size_after";i:101429;s:4:"time";d:1.9299999999999999;}}}

@kevinwhoffman
Copy link

Hi @TimSmith, thank you for reporting this. If you attempt this search as a dry run, does the UI indicate that a replacement will be made?

@TimSmith
Copy link
Author

TimSmith commented Mar 8, 2024

Yes it does. I put some debug code into BSR_DB srdb() and pulled those values from $data_to_fix and $edited_data respectively.

@kevinwhoffman
Copy link

@TimSmith Thanks for confirming. It appears that the data you shared is generated by WooCommerce. Could you provide some steps to recreate such data so we can test further?

While we can drop your sample data directly into the database to test, it would be more helpful if we can replicate it from scratch using the same plugins that you used so we can understand how widespread this issue may be.

@TimSmith
Copy link
Author

It's data about smush images taken from wp_postmeta with a meta_key 'wp-smpro-smush-data'. I presume it's from WP Smush Pro, but as they use it to keep track of what's been smushed and what hasn't there's probably an analogous record for the free version.

In principle though it looks like anything that stores serialised floating point numbers could trigger the issue as the precision cannot be guaranteed to stay the same when re-encoding. For information in case it's relevant the site has PHP ini directives serialize:14 and serialize_precison:17

@kevinwhoffman
Copy link

Thanks for the details. We are investigating further and will patch if needed.

@kevinwhoffman
Copy link

kevinwhoffman commented Mar 18, 2024

Findings

  • We believe the changed data is a side effect of deserialization during the search-and-replace process. For more information about why we deserialize data, see https://deliciousbrains.com/wp-migrate-db-pro/doc/serialized-data/.
  • The change happens for floating point numbers as PHP will round them to the number of significant bits set in serialize_precision setting in php.ini.
  • When running the test using your data, we were able to replicate the issue. The data is changed. But when we run it a second time, the data is not changed a second time since precision of the saved data matches the current environment.
  • When we set up Smush on a new site alongside BSR, the change to the data did not happen, as the data created by Smush used the same precision as BSR. We're guessing that your data was either created using a different PHP version or on a different machine.
  • The UI only shows the diff string returned from the backend and doesn’t perform any mathematical rounding on the values. Since the value has actually changed as a result of deserializing and re-serializing the data, the UI is showing you the difference, but that doesn't mean that a "match" has been detected.

Next Steps

Short-term Improvement

We are going to change the deserialization behavior to only deserialize strings when a match occurs. This will make it less likely that floating-point values would be changed as a side effect of deserialization. We are aware that this means the issue could still occur if floating-point data exists within the matched record.

Long-term Fix

We are going to explore an alternative approach to performing a search and replace on serialized data without actually having to run the PHP unserialize() function. This would require a much larger change to the internals of the plugin, but would avoid any side effects of deserialization even when a match occurs.

@TimSmith
Copy link
Author

TimSmith commented Mar 26, 2024

Thanks for that very thorough response - I also concluded that it could only happen on an environment change as it's really about floating point precision, so it's a pretty niche use case. On our clients site the Smush plugin wasn't even active any longer but I got a number of unexpected matches so exported the tables and searched myself - this showed that they were false matches. The hosting has been changed, so it could well have a different value for serialize_precision - reading up on it it seems that a value of -1 is probably better.

Short term. You're probably aware of this but note that you will still need to unserialize the data to check for a match because of escaped characters (e.g. slashes in a URL) which your plugin (unlike quite a few others) handles expertly. I'd suggest you only check for equality after re-serialization when a match was found.

Long term. That sounds hard to achieve as you're limited by the fact that floating point numbers will always have a limited precision that can vary by the environment. Personally, I'd be happy with no false matches being reported.
It's arguable whether changes to precision would matter as I'd have thought that with a change of environment the next time the data's source was used it'd be saved using the new floating point rounding anyway. Besides which you could also argue that saving it under the new precision is what the environment is asking you to do.

I suppose you could unserialize the changed and newly serialized data and see if the only change from the unmodified unserialized data was the one expected and if it isn't warn people that the environment had changed the floating point precision (e.g. on a dry run).

Thanks again. I appreciate the effort and the great plugin.

@kevinwhoffman
Copy link

Good points, I'll make sure the team factors that in to any changes made in this area.

@kevinwhoffman
Copy link

You're probably aware of this but note that you will still need to unserialize the data to check for a match because of escaped characters (e.g. slashes in a URL) which your plugin (unlike quite a few others) handles expertly. I'd suggest you only check for equality after re-serialization when a match was found.

@TimSmith I wanted to give you a heads up that we released Better Search Replace 1.4.6 with the short-term improvement described in #105 (comment). We looked into your feedback about escaped characters and confirmed it is not necessary to unserialize PHP serialized data in order to check for a match.

PHP's serialization mechanism does not escape characters in the same way JSON does. So while it's possible that a JSON-escaped URL would be missed due to extra slashes, this has always been the case and is not impacted by the changes in the short-term fix. The only noticeable change should be a performance improvement as a result of fewer unserialize() calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants