Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Records are missing in sync #694

Open
sachinnagesh opened this issue Sep 6, 2023 · 5 comments
Open

Records are missing in sync #694

sachinnagesh opened this issue Sep 6, 2023 · 5 comments

Comments

@sachinnagesh
Copy link

sachinnagesh commented Sep 6, 2023

Hi @rwynn ,

We are facing a very strange issue with monstache. We observed some of the records are not at all synched to elastic-index. It's happening for 5-10 records per 100 records and it's very random. This is observed in case of create and update. Also we don't see any logs at all related to records in monstache logs.
Just to give you idea about out setup, we have mongodb deployment with replica set. We have multiple db's (each for a specific company - multi tenant) in deployement. From each company db we want to sync a mongodb view created on product collection.

db.createView("products-view",
"products",
[
  {
    $lookup: {
      from: "product-features",
      localField: "productid",
      foreignField: "productid",
      as: "features"
    }
  },
  {
    $lookup: {
      from: "product-technical-details",
      localField: "productid",
      foreignField: "productid",
      as: "technicals"
    }
  },
  {
    $lookup: {
      from: "product-inventory",
      localField: "productid",
      foreignField: "productid",
      as: "inventory"
    }
  }
])

e.g.
dbName : company1
collections : products, product-features, product-technical-details, product-inventory
dbName : company12
collections : products, product-features, product-technical-details, product-inventory

Here is monstache.toml file looks like

mongo-url = "{{ .MongoURL }}"

elasticsearch-urls =[ "{{ .Elasticsearch.URL }}" ]
{{if .Elasticsearch.Auth.Enabled }}
elasticsearch-user = "{{ .Elasticsearch.Auth.UserName }}"
elasticsearch-password = "{{ .Elasticsearch.Auth.Password }}"
{{ end }}
{{if .Elasticsearch.SSL.Enabled }}
elasticsearch-pem-file = "{{ .Elasticsearch.SSL.Path }}"
{{ end }}

direct-read-namespaces=["company1.products-view","company2.products-view" ]

change-stream-namespaces=[ '' ]
namespace-regex='^(company1|company2)\.(products|product-features|product-technical-details|product-inventory)$'
gzip = true
stats = true
index-stats = true
dropped-collections = false
dropped-databases = false
replay = false
resume = true
resume-write-unsafe = false
resume-name = "default"
resume-strategy = 0
verbose = true
exit-after-direct-reads = false
direct-read-stateful = true
elasticsearch-retry = true
prune-invalid-json = true
relate-buffer = 500000
delete-index-pattern = "*_product-detail-index"

[gtm-settings]
buffer-duration = "100ms"

## Relate Mapping for company1
[[mapping]]
namespace = "company1.products-view"
index = "company1_product-detail-index"

[[relate]]
namespace = "company1.products"
with-namespace = "company1.products-view"
keep-src = false

[[relate]]
namespace = "company1.product-features|"
with-namespace = "company1.products"
src-field = "productid"
match-field = "productid"
keep-src = false

[[relate]]
namespace = "company1.product-technical-details"
with-namespace =  "company1.products"
src-field = "productid"
match-field = "productid"
keep-src = false

[[relate]]
namespace = "company1.product-inventory"
with-namespace = "company1.products"
src-field = "productid"
match-field = "productid"
keep-src = false

## Relate Mapping for company2
[[mapping]]
namespace = "company2.products-view"
index = "company2_product-detail-index"

[[relate]]
namespace = "company2.products"
with-namespace = "company2.products-view"
keep-src = false

[[relate]]
namespace = "company2.product-features"
with-namespace = "company2.products"
src-field = "productid"
match-field = "productid"
keep-src = false

[[relate]]
namespace = "company2.product-technical-details"
with-namespace =  "company2.products"
src-field = "productid"
match-field = "productid"
keep-src = false

[[relate]]
namespace = "company2.product-inventory"
with-namespace = "company2.products"
src-field = "productid"
match-field = "productid"
keep-src = false

We also tried by setting below parameters and removing namespace-regex but still issue persist

direct-read-namespaces=["company2.products","company2.product-features","company2.product-technical-details","company2.product-inventory"]
resume-strategy = 1

We think somehow monstache missing those create/update events. We are using monstache:6.7.10

@yunusemrecatalcam
Copy link

Does it started to happen recently? Ours is having the same problem but we never changed the monstache config for 2 months its weird

@sachinnagesh
Copy link
Author

@yunusemrecatalcam Yes we started facing issue from last 2-3 months.

@sachinnagesh
Copy link
Author

@yunusemrecatalcam we found the issue from where it's coming. While fetching data from mongo view while processing relate, it doesn't get the record at all during insertion. We have mongo replica set deployment. I feel while writing data to mongo collection, there are services which are not configured with write majority. For now we have added retry mechanism (5 times) with some delay between iteration. But still there is going to be issue during update, it may not get latest updated record.

@sachinnagesh
Copy link
Author

@yunusemrecatalcam I think another way to solve this is to add readPreference from primary

@arcimen54
Copy link

Hi,
I have a very similar problem.
im using this versions:

  • Elastic v7.17.9
  • Mongodb 6.0.14
  • Monstache 6.7.17

This is my toml file

mongo-url="mongo-url?readPreference=primary"
config-database-name="database-monstache"
elasticsearch-urls =["url"]
elasticsearch-validate-pem-file=false
elasticsearch-user="user"
elasticsearch-password="password"
elasticsearch-max-conns = 50
change-stream-namespaces = [ "collection1","collection2","collection3"]
replay = false
resume = true
resume-name = "default"
index-as-update = true
direct-read-no-timeout = true
elasticsearch-retry = true
fail-fast = false
stats = false
verbose = true
disable-change-events = false
enable-patches = true
[[mapping]]
namespace = "collection1"
index = "index1"
[[mapping]]
namespace = "collection2"
index = "index2"
[[mapping]]
namespace = "collection3"
index = "index3"
[[script]]
namespace = "collection1"
script = """
module.exports = function(doc) {
  if (doc.id) { 
    doc.owner = findId(doc.owner_id, {
      collection: "collection1"
    });
  }

  function removeKey(obj) {
    Object.keys(obj).forEach(function(key) {
      if (key === "_class") delete(obj[key]);
      if (typeof obj[key] === 'object' && obj[key] !== null) {
        removeKey(obj[key])
      }
    })
  }
  removeKey(doc);

  function isNumber (value) {
  if (value === null || value === undefined) {
    return false;
  }
  if (typeof value === "string") {
    return !isNaN(value) && !isNaN(parseFloat(value));
  }
  return !isNaN(value);
  };

  if (isNumber(doc.amount)) {
    doc.amount = doc.amount * 100
  }
  if (isNumber(doc.presales_amount)) {
    doc.presales_amount = doc.presales_amount * 100
  }

  return doc;
}
"""
[[relate]]
namespace = "collection1"
with-namespace = "collection2"
src-field = "_id"
match-field = "owner_id"
keep-src = true

It's happening for 5-10 records per 100 records and it's very random exactly like @sachinnagesh reported.
Have you got any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants