You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Much of the success in computer technology, has been the tremendous progress that data storage has undergone. When dealing with big data, the amount of data will directly have an influence on the performance of the system.
In this Unit, we will learn more about Memory Hierarchy and both internal and external data storage.
We will be able to relate the need for external data storage with large internet-based applications that require new and more scalable storage solutions. An example is cloud-based storage systems like AWS.
Topic: Introduction to Data Storage
How data is stored?
Normally, data is stored in a secondary storage device, which is the hard disk, to process the data in a computer, we need to load data into the main memory.
Processing Speed
CPU Register => CPU Cache => Main memory => Hard disk (Secondary Storage)
Internal Data Storage
Hard disk is a mechanical device, that's why it is very slow.
Back to the date, people use tape, because that is super cheap.
Modern, people use SSD, a lot faster than Hard disk.
Data on External Storage
File
A logical collection of data, physically stored as a set of pages
File Organization
Method of arranging a file of records on external storage, organized by Record ID(rid)
Architecture
Buffer manager stages pages from external storage to the main memory buffer pool.
File and index layers make calls to the buffer manager.
Why do we have a buffer manager?
Memory is smaller than disk, so we cannot load every data into the main memory at once, we can only load it page by page.
Lesson Introduction: Alternative File Organizations
In addition to traditional data storage, there are alternative file organizations. Many alternatives exist, each ideal for some situations, and not so good for others. We will explore more about Heap (random order) files, Sorted files, and Indexes in this topic.
The Cost Model
The number of page accesses is a cost measure. Reasoning
Page access cost is usually the dominant cost of database operations. An accurate model is too complex for analyzing algorithms.
Reading 3 pages is actually less time-consuming than reading just one page.
Heap File Advantage / Disadvantage
Advantage:
Efficient
for bulk loading data, (don't care about the order, just keep inserting)
for relatively small relations as indexing overheads are avoided
When queries need to fetch a large proportion of stored records
Disadvantages:
Not Efficient
for selective queries
sorting is time-consuming
Indexes
File Index
Speeds up selections on the search key fields
Any subset of the fields of a relation can be search key for an index on the relation
An index contains a collection of data entries and supports efficient retrieval of all data entries k* with a given value k
B+ Tree Indexes
Most popular indexes structure in the database system
Non-leaf pages have index entries; only used to direct searches.
Knowledge Check: Data Storage
Where is the database stored in a computer?
Central Processing Unit
[Correct] Hard disk (A database is stored in the hard disk of a computer.)
Memory
Cache
What is the correct order of processing speed of major units in a computer from the fastest to slowest?
CPU, cache, memory, hard disk
Why is the processing speed of a traditional computer hard disk lower than a modern solid-state drive (SSD)?
[Correct] Because a hard disk is a mechanical device. Contrary to the solid-state drive, a hard disk has to spin and spend more time to find a requested data byte)
Because solid state drive is a mechanical device.
Because the size of a solid state drive is bigger than that of a hard disk.
Because a hard disk can only read pages in sequence.
What is the name of the software component in a computer that loads pages from hard disk into memory?
Memory Manager
[Correct] Buffer Manager (Buffer manager loads pages from hard disk into memory)
Load Manager
Index Manager
The text was updated successfully, but these errors were encountered:
Lesson Introduction: Major Data Storage Layouts
Much of the success in computer technology, has been the tremendous progress that data storage has undergone. When dealing with big data, the amount of data will directly have an influence on the performance of the system.
In this Unit, we will learn more about Memory Hierarchy and both internal and external data storage.
We will be able to relate the need for external data storage with large internet-based applications that require new and more scalable storage solutions. An example is cloud-based storage systems like AWS.
Topic: Introduction to Data Storage
How data is stored?
Normally, data is stored in a secondary storage device, which is the hard disk, to process the data in a computer, we need to load data into the main memory.
Processing Speed
CPU Register => CPU Cache => Main memory => Hard disk (Secondary Storage)
Internal Data Storage
Hard disk is a mechanical device, that's why it is very slow.
Back to the date, people use tape, because that is super cheap.
Modern, people use SSD, a lot faster than Hard disk.
Data on External Storage
Why do we have a buffer manager?
Memory is smaller than disk, so we cannot load every data into the main memory at once, we can only load it page by page.
Lesson Introduction: Alternative File Organizations
In addition to traditional data storage, there are alternative file organizations. Many alternatives exist, each ideal for some situations, and not so good for others. We will explore more about Heap (random order) files, Sorted files, and Indexes in this topic.
The Cost Model
The number of page accesses is a cost measure.
Reasoning
Page access cost is usually the dominant cost of database operations. An accurate model is too complex for analyzing algorithms.
Reading 3 pages is actually less time-consuming than reading just one page.
Heap File Advantage / Disadvantage
Advantage:
Disadvantages:
Indexes
B+ Tree Indexes
Most popular indexes structure in the database system
Non-leaf pages have index entries; only used to direct searches.
Knowledge Check: Data Storage
The text was updated successfully, but these errors were encountered: