Skip to content

halesyy/m1-screamers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🧪 Testing Tools

Code-name Hardware Estimate CPU Cost
M1 Apple M1 MacBook Air, 8gb 256gb, 8 raw cores in Terminal US$45(Google)
PC Windows 10, i7-6700K @ 4.00 GHz (4 cores, 4 hyperthreaded) in Powershell US$280(Intel)
M2 Apple M2 Mac Mini, 24gb 1TB, 8 raw cores in Terminal
Using Python3.11
LX The above PC target, on a Linux distro, specifically ElementaryOS.
LX2 My friend**@harvmaster** PC running Linux - 12700h, 3.10.12

LX = same stats of PC, but ran on a Linux distro (ElementaryOS).

From here down, the Python versions are the same (PC: 3.8.5, M1: 3.9.2) unless specified otherwise.

🔥 1-one-to-million.py

  • M1: 1.78s
  • M1 Python 2: 0.98s
  • M2: 1.27s
  • PC: 39.89s
  • LX: 2.71s
  • LX2: 1.148s

🔥 2-threaded-scraping.py

In the below test, there are two variables that can be changed: threads and how many are done at-once. If we choose to run 50 threads, this means we'll make 50 requests in total. If we do 10 at-once, it will perform these in 10-chunk batches. The at-once metric is important since it's a key value contributing to scrape performance for cloud-based systems or locally.

  • M1, 50 threads, 50 at-once: 0.53s
  • M2, 50 threads, 50 at-once: 0.34s
  • PC, 50 threads, 50 at-once: 0.53s
  • LX, 50 threads, 50 at-once: 0.45s
  • LX2, 50 threads, 50 at-once: 4.52s

With these results, we 10x'd our required responses with only a 6x increase in time-to-serve. This says Apple's M1 threads are efficient with their thread-switching (something Python naturally struggles with).

But, there seems to be a max to this curve (of course). When we approach 1,000 threads and 1,000 at-once, we've stuffed the threads, causing a Pythonic ms-feedback loop where the threads forcibly keep switching without completing the whole request (my guess). But, when switching 1,000 threads to 100 at-once, we return to our agile speeds with 6.8 seconds.

🤔 The takeaway from this test...

Don't stuff your threads. They DO enjoy their space. But, don't give them too much credit - they are M1's... don't need to be boring and go as low as your core count 😉

🔥 3-beautiful-soups-per-second.py

A simple compute single-threaded test to observe how many soups a single thread can parse per-second. Important for engineers who are looking for an efficient machine to scrape from. Times the loading of the page (screamers/sources/3-page.html), removing unicode (by removing ord < 128) then doing 5,000 soup parses.

Test run on a 6,053 length page.

  • M1: 530 per-second (550 max)
  • M2: 1,035 per-second (used max)
  • PC: 485 per-second (489 max)
  • LX: 473 per-second
  • LX2: 867 per-second

M1 out-performs by 9.2%. M1 out-performs my personal scrape workspace in a single-core approach.

🔥 4-sha256.py

Perform 1,000,000 iterations of a sha256 hash digest + generate a random string of 10 to use as the to-hash.

  • M1: 4.95 seconds
  • M2: 1.95 seconds
  • PC: 4.42 seconds
  • LX: 5.29 seconds
  • LX2: 2.20 seconds

🔥 5-jacobian.py

Performs 1,000,000 Jacobian computations. I'm not too familiar with this branch of mathematics, but I have seen many implementations of Bitcoin wallet generation using Jacobian mathematics. 1,000 total iterations where t spans 1->1,000, and a, b both span 1->1,000 in each t iteration. Lower the better.

  • M1: 17.25 seconds
  • M2: 7.08 seconds
  • PC: 20.81 seconds
  • LX: 19.76 seconds
  • LX2: 7.28 seconds

🔥 6-threaded-compute.py

Just 50 threads filled with heavy compute. Store the first 10,000,000 integers in a list, then divide them by a float (5.2125) and return the length of the list.

  • M1: 255.48 seconds (~4 minutes) - 50 threads, 50 at-once
  • M1: 97.5 seconds - 50 threads, 8 at-once
  • M1: 49.87 seconds - 50 threads, 4 at-once
  • M1: 48.76 seconds - 50 threads, 1 at-once
  • M2: 21.45 seconds - 50 threads, 4 at-once
  • PC: untested
  • LX: 42.13 seconds - 50 threads, 4 at-once
  • LX2: 25.68 seconds - 50 threads, 4 at-once

This test was interesting, since the M1 used a lot of "virtual memory" during this test. "Memory Used" displayed 28GB on my 8GB model as "Virtual Memory". Further, this reinforces the idea to not stuff threads in M1 Python.

In reality I just realised this is because of GIL and Python and compute, so what's happening is nothing is happening in 3/4 threads when 1 is working. So, I'll re-write this in the future using multiprocessing or another measure to steer from this.

❓ The M1 Screamers

Pushing to the border: multiprocessing, threading, scraping, and compute tests with Apple M1 Silicon & their Windows/Linux counterparts.

📝 Future Screamer Test Ideas:

  • Print the first million numbers
  • Threading a scrape and BeautifulSoup parse with adjustable threads
  • Complex Jacobian mathematics, generating ecliptics, etc
  • Million-iteration cryptography (SHA256, 512, XMR's algo, etc...)
  • Video cut-and-render using moviepy (copy + hard encode)
  • My personal y12 heavy-compute smart art functions (requires seperate project link, won't publish source here)
  • My personal image re-interpretation functions

About

Multiprocessing, threading, scraping, and compute tests with Apple M1 Silicon tests + their Windows/Linux counterparts.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published