Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit 4: Data Indexing #35

Open
cheatsheet1999 opened this issue Sep 13, 2021 · 0 comments
Open

Unit 4: Data Indexing #35

cheatsheet1999 opened this issue Sep 13, 2021 · 0 comments

Comments

@cheatsheet1999
Copy link
Owner

cheatsheet1999 commented Sep 13, 2021

Lesson Introduction: Major Indexing Schemes in Database Systems

Screen Shot 2021-09-12 at 8 04 20 PM

Topic: Hash-based Indexes

Hash-based indexes, the hash function is built upon the function that we do the hasing on. For example, id = 2, the hash function is on id

  • Good for equality selections
  • ⬇️ The formular decide which bucket the data stored in

Screen Shot 2021-09-22 at 2 34 19 PM

Index entries help people to navigate and search, data entries is the last level where people retrieve the data where it's stored in the database

Lesson Introduction: Index Classification

We know that data is stored in the form of records. Every record has a key field, this make this field be recognized uniquely. An index is a data structure that locates these key fields within the database file. There are several different indexing systems.
Screen Shot 2021-09-22 at 2 45 24 PM

Topic: Index Classification

  • Clustered index is very well organized, its the same sorting order, the data entries and data records have.
  • Unclustered index: The sorting order in the data entry is not similar to the sorting order of the data records

One table can only have one clustred index since sorting order need to follow data records in a exact same way, but we can have as many unclustered index as needed, because there are many combinations of sorting order.

Index selection guidelines

  • Attribute in WHERE clause. (These are good candidate to add index keys), if the index we are gonna create is benefit multiple queries, that is a good criteria.

Screen Shot 2021-09-22 at 3 13 51 PM

If condition is: age = 30 AND 3000 < salary < 5000
Clustered <age, salary> index is much better than <salary, age>, since age is a euqality attribute and lot more specific and selective

Knowledge Check: Major Indexing Schemes

  1. Which of the following database operations can be achieved by using hash-based indexes?
  • To retrieve student data where age is between 18 and 20.

  • [Correct] To retrieve student data where age is equal to 20. (Hash-based indexes are good for equality selections.)

  • To retrieve student data where age is less than 20.

  • To retrieve student data where age is different than 20

  1. If a database table has m rows and n columns, what is the maximum number of clustered indexes for this table?
  • mn
  • m
  • n
  • [Correct!] 1 (A database table can only have one clustered index.)

Unit 3 & 4: Practice Quiz

  1. Which of the following file organizations must be used if one needs to retrieve all records in random order?
  • Heap Files
  1. For which type of operations are heap files a sufficient method?
  • Bulk loading data (Bulk loading data does not require special treatment. Therefore, heap files are sufficient for such operations.)
  1. What is the purpose of file indexes?
  • Searching Data (File indexes are created to speed up data search process.)
  1. In terms of decreasing the time it takes to process data searches, when does it make sense to create indexes for a database table?
  • When data retrieved by queries is a small percentage of the available data in the table. (Indexes are most useful when data retrieved by queries is a small percentage of the available data.)
  1. Creating indexes is not a straightforward decision because of the costs involved. One must answer certain questions before creating indexes. Which of the following questions is not relevant?
  • Should one build multiple indexes?
  • [Correct] Which record(s) should be part of an index?
  • Which field(s) should be part of an index?
  • Which tables should be indexed?

Unit 3 & 4: Quiz

  1. Which of the following computer components has the fastest processing speed?
  • CPU
  1. Which of the following computer components has the slowest processing speed?
  • Hard Disk
  1. The buffer manager loads pages from hard disk to which part of the computer?
  • Memory
  1. Which of the following statements about heap files are correct?
  • Heap files are in random order.
  1. What kind of file organization is sufficient for bulk loading data?
  • Heap Files (Bulk loading data does not require special treatment. Therefore, heap files are sufficient for such operations.)
  1. What kind of indexing scheme should be used in order to retrieve data on customers whose zip code is equal to 06902?
  • Hash-based index (Hash-based indexes are good for equality selections.)
  1. Which of the following is the biggest disadvantage of using indexes in a database?
  • Indexes require extra storage, which is a disadvantage
  1. Which of the following is not a decision variable when building indexes?
  • The records to be included in an index. (The records are the outcome of an index; they are not a decision variable.)
  1. Where are index files permanently stored in a computer?
  • Hard disk

Screen Shot 2021-09-12 at 8 10 52 PM

  1. Where is database stored in a computer?
    Central Processing Unit
    ✅ Hard disk (A database is stored in the hard disk of a computer)
    Memory
    Cache

  2. What is the correct order of processing speed of major units in a computer from the fastest to slowest?
    ✅ CPU, cache, memory, hard disk

  3. Why is the processing speed of a traditional computer hard disk lower than a modern solid state drive (SSD)?
    ✅ Because hard disk is a mechanical device. (Contrary to solid state drive, a hard disk has to spin and spend more time to
    find a requested data byte
    )
    Because solid state drive is a mechanical device.
    Because the size of a solid state drive is bigger than that of a hard disk.
    Because a hard disk can only read pages in sequence.

  4. What is the name of the software component in a computer that loads pages from hard disk in to memory?
    ✅ Buffer Manager

  5. Which of the following database operations can be achieved by using hash-based indexes?
    ✅ To retrieve student data where age is equal to 20.
    Hash-based indexes are good for equality selections.

  6. If a database table has m rows and n columns, what is the maximum number of clustered indexes for this table?
    ✅ A database table can only have one clustered index.

  7. Which of the following file organizations must be used if one needs to retrieve all records in random order?
    ✅ Heap Files

  8. ** For which type of operations are heap files a sufficient method?**
    ✅ Bulk loading data does not require special treatment. Therefore, heap files are sufficient for such operations.

  9. What is the purpose of file indexes?
    ✅ File indexes are created to speed up data search process.

  10. In terms of decreasing the time it takes to process data searches, when does it make sense to create indexes for a database table?
    ✅ Indexes are most useful when data retrieved by queries is a small percentage of the available data.

  11. Creating indexes is not a straightforward decision because of the costs involved. One must answer certain questions before creating indexes. Which of the following questions is not relevant?
    Should one build multiple indexes?
    ✅ Which record(s) should be part of an index?
    Which field(s) should be part of an index?
    Which tables should be indexed?

@cheatsheet1999 cheatsheet1999 changed the title Data Storage Unit 4: Data Indexing Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant