Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor document code, and avoid [un]marshaling doc bodies #6108

Open
wants to merge 87 commits into
base: main
Choose a base branch
from

Conversation

snej
Copy link
Contributor

@snej snej commented Feb 22, 2023

CBG-2951

Moved most of the Document, Revision and RevTree code, and a lot of Attachment stuff, out of the db package into a new document package. (I'm not wild about that name, but doc would conflict with the very common parameter name doc, and docs is already taken as a directory.) Two reasons for this:

  1. Creates a privacy boundary and lets these structs, esp. Document, protect their private fields from being groped by other code in the db package. There's a fair bit of this going on with the _body and _rawBody fields.)
  2. The db package is really big and it would be nice to have smaller packages, IMHO, just for readability and code navigation.

To reduce the number of diffs, I created a file db/doc_model.go that creates aliases for many of the types and constants that were moved. I also added an alias for the function ParseRevID because there are so damn many calls to it.

To cope with the code in db that touched Document's _body and _rawBody fields, I added methods PeekBody, PokeBody, PeekRawBody, PokeRawBody. These can be removed after the code that calls them is fixed.

There are no functional changes in this PR; everything should behave exactly as before.

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Dependencies (if applicable)

  • Link upstream PRs
  • Update Go module dependencies when merged

Integration Tests

@snej snej force-pushed the refactor-document branch 3 times, most recently from 5df9a94 to f811b49 Compare February 23, 2023 19:42
@snej snej marked this pull request as ready for review February 23, 2023 20:00
@snej snej changed the title Moved document, revision, attachment stuff into new docmodel package Moved document, revision, attachment stuff into new 'document' package Feb 23, 2023
- JSONExtract now uses a callback to delegate the logic about which keys are
  special/reserved.
- implemented JSONExtract for CE
- DocumentRevision calls JSONExtract, handles special properties.
# Conflicts:
#	rest/replicatortest/replicator_test.go
Body changed to UnmarshalBody.
Removed some usages of both methods, working with JSON data instead.
- RevisionCache.GetRevision no longer returns a Body
- Removed body field from revCacheValue struct
@adamcfraser
Copy link
Collaborator

@snej You indicate that there are no functional changes in this PR - how does that apply to the changes around unmarshalling document bodies? Can you summarize the changes being made to that functionality?

@snej
Copy link
Contributor Author

snej commented May 16, 2023

@adamcfraser In this PR the document bodies are still unmarshaled when passed to JS (Otto). That will change with the V8 PR, at least when V8 is enabled; there we pass the JSON as a string and unmarshal it in JavaScript which is more efficient.

Otherwise, incoming doc bodies are still validated to make sure they don't contain illegal/reserved properties, but that's done without unmarshaling. The JSON does get parsed, but without converting it to Go objects which is the expensive part.

bbrks and others added 20 commits May 17, 2023 12:51
* remove cached buckets
* Add refcounting for parallel calls to get buckets
* Update waiting error message from generic message

Co-authored-by: Ben Brooks <ben.brooks@couchbase.com>
* Tweak TestIncrCounter to cover non-equal `def` and `amt` values

* Allow zero value in `amt` for Collection.Incr - will write `def` to bucket if counter does not exist
* CBG-2983 Close cbgt agents on database close

Manager.Stop() doesn’t close cbgt’s gocb agents - partly because the fts use case is to have one manager that exists for the lifetime of the process.  For SG’s usage, where manager lifecycle is bound to a database, we need to shut down these agents when we close the database/importListener.

* Improve inline documentation for CloseStatsClients call
* CBG-3024 Make sure import feed uses checkpoints
* CBG-3001 Avoid bucket retrieval error during OnFeedClose

OnFeedError was checking for bucket existence to determine whether to call NotifyMgrOnClose().  This handling isn't necessary for SG, as we want database close to handle shutdown of the import feed (via importListener.Stop()) in the case of a deleted bucket.
# Conflicts:
#	documents/revision_cache_interface.go
* CBG-3028: fixes for failing CE tests

* remove print line

* updates off comment
…6258)

* CBG-2793: attachment compaction code erroneously sets failOnRollback

* spelling errors and test logging removal + lint error

* updates for comments and try fix race condition

* linter error fix

* changes for jenkins

* updates to add handling for cleanup phase

* remove logging line

* fix race in Jenkins

* updates to fix collections issue

* updates based off comment

* updates

* updates to address complexity comments
@snej
Copy link
Contributor Author

snej commented Jun 5, 2023

Created a perf testing task CBPS-1176 for this.

@adamcfraser
Copy link
Collaborator

Perf results show a decent ~5% improvement in throughput (and accompanying reduction in CPU) - we should go ahead and review/merge based on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants