Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase BaseX storage limits: namespaces, number of nodes #902

Open
gimsieke opened this issue Mar 21, 2014 · 17 comments
Open

Increase BaseX storage limits: namespaces, number of nodes #902

gimsieke opened this issue Mar 21, 2014 · 17 comments

Comments

@gimsieke
Copy link
Contributor

in 7.8, it’s still 256

@ChristianGruen
Copy link
Member

Requires a new storage layout. Will probably be aligned with a higher node id limit (which would also fix #676).

@ChristianGruen ChristianGruen changed the title Increase the max. number of namespaces in a DB Increase BaseX storage limits: namespaces, number of nodes Sep 17, 2015
@ChristianGruen
Copy link
Member

See also #1193

@innovimax
Copy link

subscribe

@innovimax
Copy link

https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/Data.java#L27

    • -
    • The table is limited to 2^31 entries (pre values are signed int's)
    • -
    • A maximum of 2^15 different element and attribute names is allowed
    • -
    • A maximum of 2^8 different namespaces is allowed
    • -

@malamut2
Copy link
Contributor

We also ran into the nodes limit today; the corresponding error message is 'Insertion at beginning of populated table.'. This is pretty bad: the system is running faithfully for years, and then all of a sudden it's 'game over'.

I understand that limits must exist, and Integer.MAX_VALUE is of course a typical limit in Java. However, it would be great if the existing limits could be documented somewhere more prominently. Also, I feel that all three limits mentioned in innovimax's last comment are likely to be exceeded by standard real-world applications nowadays. New storage layout is a big task, but I fear it will be needed.

@hhv
Copy link
Contributor

hhv commented Mar 31, 2016

Hi,

we ran into the same problem loading all Dutch legislation into BaseX. Although partitioning the dataset is an option, I think BaseX should be able to handle a real-world use-case like this, without users having to 'work around' it's limitations.

@cfoster
Copy link
Contributor

cfoster commented Apr 19, 2016

What about an HDFS implementation of BaseX Database?
To distribute the load of data across HDFS nodes?
Just a thought.

@ChristianGruen
Copy link
Member

What about an HDFS implementation of BaseX Database?

Late reply, but better than none: The PAXQuery engine (Homepage, Paper) is worth to be mentioned. It utilized BaseX to speed up queries via HDFS and Apache Flink.

@ChristianGruen ChristianGruen added this to the HOLD milestone Dec 7, 2016
@gimsieke
Copy link
Contributor Author

gimsieke commented Feb 1, 2018

Will this be fixed in 9.0?

@ChristianGruen
Copy link
Member

As this would require a completely new storage layout, it would be quite a breaking change. However, it might get an option if we find a potent sponsor.

@gimsieke
Copy link
Contributor Author

gimsieke commented Feb 1, 2018

While I probably cannot convince my fellow managing directors to fund this single issue (so that I can finally index all the XSLT/XProc code and all other XML files on my hard disk), I will suggest that we make a lump sum donation that you might use for stuff like this.

We’ve been contemplating adding some “Github issue crowdfunding functionality” to our transpect repos, for issues that don’t have priority for us to fix but where users can collectively fund fixes. We’ve been (very briefly) looking at https://freedomsponsors.org/⁠. Maybe this is interesting for BaseX, too.

I suggest that we discuss it privately or open another issue for this and solicit user feedback on the mailing list.

@ChristianGruen
Copy link
Member

Thanks for the link to freedom sponsors, could be interesting indeed!

@kgaleazzi
Copy link

We apparently ran into the limit with BaseX 9.4.5. How can we check if the limit has been reached?
We are inserting data and consistently get this error (with slightly different numeric values) when the process reaches a certain point:

java.lang.RuntimeException: Data Access out of bounds:

  • pre value: 2147479679
  • table size: -2147479197
  • first/next pre value: -2147479425/-2147479197
  • #total/used pages: 8388848/8388848
  • accessed page: 8388847 (8388848 > 8388847]
    at org.basex.util.Util.notExpected(Util.java:61)
    at org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:474)
    at org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:158)
    at org.basex.data.Data.kind(Data.java:312)
    ...

@ChristianGruen
Copy link
Member

@kgaleazzi It seems we need to add some more limit checks. How do you insert your data?

@kgaleazzi
Copy link

We use 'insert node as last' as we rely on the node order with our application. Please advise if there are better ways to insert data and avoid the out of bounds exceptions.

@ChristianGruen
Copy link
Member

In the long term, insertions should be rejected by BaseX if the database are exceeded.

One manual way to avoid out of bounds exceptions is to check the current number of nodes before inserting data via XQuery:

declare variable $LIMIT := 2000000000;

let $size := db:property($db, 'size')
return if ($size > $LIMIT) then (
  error((), 'Database node limit is reached.')
) else (
  insert ...
)

@kgaleazzi
Copy link

That helps, thank you. So the limit accounts for an estimate of the maximum size of the data being inserted (2^31 - $LIMIT).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants