Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegexMatchError: __init__: could not find match for ^\w+\W #1918

Closed
talkinglim opened this issue May 6, 2024 · 58 comments
Closed

RegexMatchError: __init__: could not find match for ^\w+\W #1918

talkinglim opened this issue May 6, 2024 · 58 comments
Labels

Comments

@talkinglim
Copy link

# Download audio from yt playlist
from pytube import Playlist,YouTube

def download_audio_from_playlist(playlist_url, output_path):
    playlist = Playlist(playlist_url)
    for video in playlist.videos:
        audio_stream = video.streams.get_audio_only()
        audio_stream.download(output_path=output_path, filename=video.title + ".mp4")
        
playlist_url = "https://www.youtube.com/watch?v=yv77OZ_og-o&list=PLpODSd__yLPXQkQ0T8K_I637Rqo2f7NQo"
download_audio_from_playlist(playlist_url, 'audio/')
---------------------------------------------------------------------------
RegexMatchError                           Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/pytube/__main__.py](https://localhost:8080/#) in fmt_streams(self)
    180         try:
--> 181             extract.apply_signature(stream_manifest, self.vid_info, self.js)
    182         except exceptions.ExtractError:

7 frames
RegexMatchError: __init__: could not find match for ^\w+\W

During handling of the above exception, another exception occurred:

RegexMatchError                           Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/pytube/cipher.py](https://localhost:8080/#) in __init__(self, js)
     31         var_match = var_regex.search(self.transform_plan[0])
     32         if not var_match:
---> 33             raise RegexMatchError(
     34                 caller="__init__", pattern=var_regex.pattern
     35             )

RegexMatchError: __init__: could not find match for ^\w+\W

I have changed the cipher.py as suggested by many past issues but it doesnt fix it for me.

# line 30 of cipher.py from "var_regex = re.compile(r"^\w+\W")"
var_regex = re.compile(r"^\$*\w+\W")
@talkinglim talkinglim added the bug label May 6, 2024
Copy link

github-actions bot commented May 6, 2024

Thank you for contributing to PyTube. Please remember to reference Contributing.md

@jwhogg
Copy link

jwhogg commented May 6, 2024

Same issue here, worked earlier today, and had this error for the past few hours over different machines. Seems youtube may have changed something to break PyTube.

@simonesecchi94
Copy link

same issue here! Works fine till yesterday (from my side):

venv/lib/python3.10/site-packages/pytube/cipher.py", line 33, in init
raise RegexMatchError(
pytube.exceptions.RegexMatchError: init: could not find match for ^\w+\W

@taeksin
Copy link

taeksin commented May 6, 2024

same issue here!
error : init: could not find match for ^\w+\W

It seems like there's an issue with pytube. I've been using this library to work on my graduation project, and I would greatly appreciate it if you could fix it as soon as possible. It's really urgent.

@mcosti
Copy link

mcosti commented May 6, 2024

+1 on this issue

@mimbres
Copy link

mimbres commented May 6, 2024

+1 today

@AbdelRahmanMohamed2002
Copy link

my graduation project discussion is in 4 days please fix it as soon as possible

@Eric-A99
Copy link

Eric-A99 commented May 6, 2024

+1 on this

@lumap
Copy link

lumap commented May 6, 2024

+1, let's thank youtube for changing their function names 🙏

@DiegoLibonati
Copy link

DiegoLibonati commented May 6, 2024

Same error here:

2024-05-06 18:05:41 - - [06/May/2024 21:05:41] "POST /v1/cut/cut_video HTTP/1.1" 500 -
2024-05-06 18:05:41 Traceback (most recent call last):
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/pytube/__main__.py", line 181, in fmt_streams
2024-05-06 18:05:41     extract.apply_signature(stream_manifest, self.vid_info, self.js)
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/pytube/extract.py", line 409, in apply_signature
2024-05-06 18:05:41     cipher = Cipher(js=js)
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/pytube/cipher.py", line 33, in __init__
2024-05-06 18:05:41     raise RegexMatchError(
2024-05-06 18:05:41 pytube.exceptions.RegexMatchError: __init__: could not find match for ^\w+\W
2024-05-06 18:05:41 
2024-05-06 18:05:41 During handling of the above exception, another exception occurred:
2024-05-06 18:05:41 
2024-05-06 18:05:41 Traceback (most recent call last):
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1498, in __call__
2024-05-06 18:05:41     return self.wsgi_app(environ, start_response)
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1476, in wsgi_app
2024-05-06 18:05:41     response = self.handle_exception(e)
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1473, in wsgi_app
2024-05-06 18:05:41     response = self.full_dispatch_request()
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 882, in full_dispatch_request
2024-05-06 18:05:41     rv = self.handle_user_exception(e)
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 880, in full_dispatch_request
2024-05-06 18:05:41     rv = self.dispatch_request()
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 865, in dispatch_request
2024-05-06 18:05:41     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
2024-05-06 18:05:41   File "/home/app/blueprints/v1/cut_bp.py", line 19, in cut_video
2024-05-06 18:05:41     return cut_controller.cut_video()
2024-05-06 18:05:41   File "/home/app/controllers/cut_controller.py", line 36, in cut_video
2024-05-06 18:05:41     message, load_stream = video.get_better_stream()
2024-05-06 18:05:41   File "/home/app/models/video.py", line 54, in get_better_stream
2024-05-06 18:05:41     self.stream = self.video.streams.filter(
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/pytube/__main__.py", line 296, in streams
2024-05-06 18:05:41     return StreamQuery(self.fmt_streams)
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/pytube/__main__.py", line 188, in fmt_streams
2024-05-06 18:05:41     extract.apply_signature(stream_manifest, self.vid_info, self.js)
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/pytube/extract.py", line 409, in apply_signature
2024-05-06 18:05:41     cipher = Cipher(js=js)
2024-05-06 18:05:41   File "/usr/local/lib/python3.9/site-packages/pytube/cipher.py", line 33, in __init__
2024-05-06 18:05:41     raise RegexMatchError(
2024-05-06 18:05:41 pytube.exceptions.RegexMatchError: __init__: could not find match for ^\w+\W

My logs in docker.

@talkinglim
Copy link
Author

just as a suggestion, my playlist is only like 2+hours so i downloaded all of it and bypass pytube for now.

Work have to be done and i cant wait any longer.

@mcosti
Copy link

mcosti commented May 6, 2024

I very quickly migrated to https://github.com/yt-dlp/yt-dlp which seems more mature anyway.

I am using it like this (using a named temporary directory and then returning bytes to be rendered by my request, adjust it to your own needs):

from yt_dlp import YoutubeDL, DownloadError

def download_youtube_mp3(youtube_id, temporary_directory):
    url = f"https://www.youtube.com/watch?v={youtube_id}"
    ydl_opts = {
        'format': 'bestaudio/best',
        "max_filesize": 20 * 1024 * 1024,
         "outtmpl": f"{temporary_directory}/%(id)s.%(ext)s",
        "noplaylist": True,
        "verbose": True,
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }],
    }
    ydl = YoutubeDL(ydl_opts)
    try:
        meta = ydl.extract_info(
            url,
            download=True,
        )
    except DownloadError as e:
        raise e
    else:
        video_id = meta["id"]
        return {
            "title": meta["title"],
            "file_path": f"{temporary_directory}/{video_id}.mp3"
        }
    with tempfile.TemporaryDirectory() as temporary_directory:
        audio_details = download_youtube_mp3(youtube_id, temporary_directory)
        with open(audio_details["file_path"], "rb") as f:
            bytes_data = f.read()
            return bytes_data

@rafleze
Copy link

rafleze commented May 6, 2024

+1

@mgruppi
Copy link

mgruppi commented May 6, 2024

Found a quick fix - turns out current function name includes "$O", a non-word character which will not match "\w+".

Edited to include @ghrist8p's comment: as per ECMA's spec, valid identifiers may include symbols "$" and "_" in addition to word-characters.

Change cipher.py line 30 to:

var_regex = re.compile(r"^[\w\$_]+\W")

I do not know all the possible patterns, but this is working for me.

@horue
Copy link

horue commented May 6, 2024

var_regex = re.compile(r"^(\w|\$)+\W")

Worked.

@voxfox01
Copy link

voxfox01 commented May 6, 2024

var_regex = re.compile(r"^(\w|\$)+\W")

Worked.

how would I get it to work in Colab?

@ghrist8p
Copy link

ghrist8p commented May 6, 2024

As per the spec an identifier can have a $ or _ anywhere in the identifier name so the regex pattern should be r"^[\w\$_]+[^\w\$_]" to be future proof.

@azarashi-3
Copy link

var_regex = re.compile(r"^[\w\$_]+\W")

Thank you for your help!
I was really in trouble.
I appreciate your efforts.

I'm also looking forward to official updates from the program side.

@wasciutto
Copy link

Found a quick fix - turns out current function name includes "$O", a non-word character which will not match "\w+". Change cipher.py line 30 to: (Edited to include @ghrist8p's comment)

var_regex = re.compile(r"^[\w\$_]+\W")

I do not know all the possible patterns, but this is working for me.

bless you

@taeksin
Copy link

taeksin commented May 7, 2024

Where did you enter var_regex = re.compile(r"^[\w$_]+\W")? It wasn't in the pytube code I was using.

I'm getting an error in this code:

audio_file_path = os.path.join(output_folder, cleaned_title + '.mp3') yt.streams.filter(only_audio=True).first().download( output_path=output_folder, filename=cleaned_title + '.mp3' )

@nickpotafiy
Copy link

nickpotafiy commented May 7, 2024

Where did you enter var_regex = re.compile(r"^[\w$_]+\W")? It wasn't in the pytube code I was using.

I'm getting an error in this code:

audio_file_path = os.path.join(output_folder, cleaned_title + '.mp3') yt.streams.filter(only_audio=True).first().download( output_path=output_folder, filename=cleaned_title + '.mp3' )

cipher.py on line 30. You can check my repo commit.

@taeksin
Copy link

taeksin commented May 7, 2024

Where did you enter var_regex = re.compile(r"^[\w$_]+\W")? It wasn't in the pytube code I was using.
I'm getting an error in this code:
audio_file_path = os.path.join(output_folder, cleaned_title + '.mp3') yt.streams.filter(only_audio=True).first().download( output_path=output_folder, filename=cleaned_title + '.mp3' )

cipher.py on line 30. You can check my repo commit.

Thank you for your help.

@zhu6201976
Copy link

i fix this bug today.
vim site-packages\pytube\cipher.py
modify line 30: var_regex = re.compile(r"^\w+\W") --> var_regex = re.compile(r"^.+?.")
then it will run ok.

@iShouldNotCode
Copy link

+1

@coco1718
Copy link

coco1718 commented May 7, 2024

I've tried all of the above codes, but they don't work. They worked until yesterday. Working on Colab.

    var_regex = re.compile(r"^\w+\W")  =>  Code used until yesterday

    var_regex = re.compile(r"^\$*\w+\W") 

    var_regex = re.compile(r"^[\w\$_]+\W")
    var_regex = re.compile(r"^[\w$_]+\W")
    var_regex = re.compile(r"^(\w|\$)+\W")
    var_regex = re.compile(r"^[\w\$_]+[^\w\$_]") 
    var_regex = re.compile(r"^.+?.")

   None of them work.

@Gamer-Victorch
Copy link

I've tried all of the above codes, but they don't work. They worked until yesterday. Working on Colab.

    var_regex = re.compile(r"^\w+\W")  =>  Code used until yesterday

    var_regex = re.compile(r"^\$*\w+\W") 

    var_regex = re.compile(r"^[\w\$_]+\W")
    var_regex = re.compile(r"^[\w$_]+\W")
    var_regex = re.compile(r"^(\w|\$)+\W")
    var_regex = re.compile(r"^[\w\$_]+[^\w\$_]") 
    var_regex = re.compile(r"^.+?.")

   None of them work.

@coco1718 var_regex = re.compile(r"^[\w$_]+\W") still works for me somehow

@coco1718
Copy link

coco1718 commented May 7, 2024

I think there's another problem, but I can't find it

@MammadTavakoli
Copy link

MammadTavakoli commented May 7, 2024

I solved with this code:

import requests

def download_stream(url, filename):
  """
  Simulates downloading a video stream from a URL and saves it to a file.

  Args:
      url: The URL of the video stream.
      filename: The name of the file to save the video to.
  """
  # Set chunk size for efficient downloading
  chunk_size = 1024

  try:
    response = requests.get(url, stream=True)
    response.raise_for_status()  # Raise an exception for failed requests

    # Check for content type (video or not)
    if not response.headers['Content-Type'].startswith('video/'):
      print(f"Warning: Content type '{response.headers['Content-Type']}' might not be a video.")

    with open(filename, 'wb') as f:
      for chunk in response.iter_content(chunk_size):
        # Filter out keep-alive new chunks
        if chunk: 
          f.write(chunk)
      print(f"Download complete! Saved to: {filename}")

  except requests.exceptions.RequestException as e:
    print(f"Error downloading video: {e}")

# Function to download a video from YouTube using pytube
def download_video_by_pytube(url, base_path="//content//drive//MyDrive//YouTube//Download_video//",
                             playlist_start=1, 
                             dl_pytube_stream=True,
                             skip_download=False, 
                             max_resolution=1280,
                             target_languages=["en"], translated_lang=[]):

    # Check if the URL is for a playlist
    if 'list=' in url.lower():
        playlist = Playlist(url)
        folder_name = f"{playlist.title}_{playlist.owner}"
        playlist = list(playlist.video_urls)
    else:
        playlist = []
        playlist.append(url)
        youtube = YouTube(url)
        folder_name = f"{youtube.title}_{youtube.author}"

    for i, video_url in enumerate(playlist[playlist_start-1:]):
        youtube = YouTube(video_url)

        # Get the name of the video
        title = youtube.title.translate(table_punctuation).translate(table_digit)
        j = str(i+playlist_start).zfill(3)
        filename = f"{j}_{title}"
        print(f"{int(j)}-th from {len(playlist)} : {title}")
        filename=f"{filename}.mp4"
        try:
          if not dl_pytube_stream:
            video_dict = {}
            for streaming_data in youtube.streaming_data['formats']:
              if 'video/mp4;' in streaming_data['mimeType']:
                video_dict[streaming_data['url']] = int(streaming_data['qualityLabel'].split('p')[0])

            video_dict = filtered_dict = dict(filter(lambda x: x[1] <= max_resolution, video_dict.items()))
            videos = sorted(video_dict.items(), key=lambda x: x[1], reverse=True)
            # print(videos[0][0])
            download_stream(videos[0][0], filename)
          else:

              # Get the streams that have a resolution
            streams = [stream for stream in youtube.streams.filter(progressive=True, file_extension='mp4') if stream.resolution]
            print('************************')
            # Sort the streams by resolution in descending order
            streams.sort(key=lambda stream: int(stream.resolution.replace('p', '')), reverse=True)

            # Find the first stream that has a resolution less than or equal to max_resolution
            stream = next((stream for stream in streams if int(stream.resolution.replace('p', '')) <= max_resolution), None)

            # Download the video and subtitle and specify the filename
            if stream:
                if not skip_download:
                    stream.download(filename)
                    print("Downloaded resolution is:", stream.resolution)
 
          print('_'*50)
        except Exception as e:
          print(e)

you must set dl_pytube_stream=False for now.

@Ca-tt
Copy link

Ca-tt commented May 7, 2024

+1
Same error appeared today!

@Ca-tt
Copy link

Ca-tt commented May 7, 2024

No problema, @mahdjourOussama

@medhsv
Copy link

medhsv commented May 7, 2024

Today only came across this issue after using it for atleast a month i tried the "line 30" code modification, but it did not solve it
i use pytube latest version i use it google colab
I opened the file it suggested in the error and made the change needed then i rerun the cell in which this error was caused. did not solve it
if im doing something wrong please correct me need help!!

In Google Colab: If you are still getting the "\w+\W" regex error, make sure to go to Tool -> Restart Session after making changes to cipher.py. Otherwise it will not load the changes.

thanks alot! finally got it working

barapa added a commit to barapa/pytube that referenced this issue May 7, 2024
@abhiram-ar
Copy link

please merge the fix

@radio-satellites
Copy link

+1 can confirm bug

@EnzoBustos
Copy link

+1 I'm getting the same bug in collab as well.

@NataLeite
Copy link

+1 I'm getting the same bug in collab

@veered
Copy link

veered commented May 7, 2024

If you just want to monkey patch in the fix, put this file somewhere and then import it before you use pytube.

It's a copy of the normal pytube cipher.py file but with the relevant regex changed to the above suggested var_regex = re.compile(r"^[\w\$_]+\W"). It then monkey patches the __init__ on the original Cipher class.

Most of the file isn't necessary, and is just there because it's a straight copy of cipher.py. I haven't tested it much, so no guarantees on robustness.

@abhiram-ar
Copy link

@mahdjourOussama you can substitute pytube's class Cipher with your custom class CustomCipher.

  1. Import pytube's class Cipher into your code
  2. Write your own class by just copy-pasting Cipher and replacing one line of code
  3. Rewrite Cipher with your custom class
from pytube import cipher

class CustomCipher:
    def __init__(self, js: str):
        self.transform_plan: List[str] = get_transform_plan(js)
        # var_regex = re.compile(r"^\w+\W")
        var_regex = re.compile(r"^[\w\$_]+\W")
        # other class code...

cipher.Cipher = CustomCipher

It should work if done properly

working perfectly afeter session reboot in collab

jonaschuman added a commit to volkno-inc/pytube that referenced this issue May 7, 2024
@CanErsoy20
Copy link

For those who are trying to solve this in their Docker Container (I suppose it works for Google Collab as well but I'm not sure):

  1. Fork the Pytube repository
  2. Make the fix mentioned above (changing line 30 of cipher.py to var_regex = re.compile(r"^[\w$_]+\W")), in the forked repository.
  3. Commit the changes.
  4. In your requirements.txt file (or where you install the required libraries), instead of pytube=15.0.0, use pytube @ git+https://github.com/repo_owner_name/pytube

kentaroishizaki added a commit to kentaroishizaki/pytube that referenced this issue May 8, 2024
mzeeshanaltaf added a commit to mzeeshanaltaf/pytube_cipher_fix that referenced this issue May 8, 2024
Updated the following code:

Comment out line # 30: var_regex = re.compile(r"^\w+\W")

Added following line:
 var_regex = re.compile(r"^[\w$_]+\W")

This fixes the following issue: pytube#1918
slarty667 added a commit to slarty667/pytube that referenced this issue May 8, 2024
RegexMatchError: __init__: could not find match for ^\w+\W - see pytube#1918
rafleze added a commit to rafleze/pytube that referenced this issue May 8, 2024
@pcastagnaro
Copy link

@mahdjourOussama you can substitute pytube's class Cipher with your custom class CustomCipher.

  1. Import pytube's class Cipher into your code
  2. Write your own class by just copy-pasting Cipher and replacing one line of code
  3. Rewrite Cipher with your custom class
from pytube import cipher

class CustomCipher:
    def __init__(self, js: str):
        self.transform_plan: List[str] = get_transform_plan(js)
        # var_regex = re.compile(r"^\w+\W")
        var_regex = re.compile(r"^[\w\$_]+\W")
        # other class code...

cipher.Cipher = CustomCipher

It should work if done properly

working perfectly afeter session reboot in collab

It worked perfectly!

@Gamer-Victorch
Copy link

Gamer-Victorch commented May 8, 2024

@mahdjourOussama you can substitute pytube's class Cipher with your custom class CustomCipher.

  1. Import pytube's class Cipher into your code
  2. Write your own class by just copy-pasting Cipher and replacing one line of code
  3. Rewrite Cipher with your custom class
from pytube import cipher

class CustomCipher:
    def __init__(self, js: str):
        self.transform_plan: List[str] = get_transform_plan(js)
        # var_regex = re.compile(r"^\w+\W")
        var_regex = re.compile(r"^[\w\$_]+\W")
        # other class code...

cipher.Cipher = CustomCipher

It should work if done properly

working perfectly afeter session reboot in collab

It worked perfectly!

Not for me though somehow, so i made this code

from pytube import cipher
def fix(self, js: str):
    import re #this is added since re isn't imported
    self.transform_plan: List[str] = cipher.get_transform_plan(js) #added "cipher." to every function below to allow it to be called
    var_regex = re.compile(r"^[\w\$_]+\W") #the line 30 modification is done
    var_match = var_regex.search(self.transform_plan[0])
    if not var_match:
        raise RegexMatchError(
            caller="__init__", pattern=var_regex.pattern
        )
    var = var_match.group(0)[:-1]
    self.transform_map = cipher.get_transform_map(js, var)
    self.js_func_patterns = [
        r"\w+\.(\w+)\(\w,(\d+)\)",
        r"\w+\[(\"\w+\")\]\(\w,(\d+)\)"
    ]

    self.throttling_plan = cipher.get_throttling_plan(js)
    self.throttling_array = cipher.get_throttling_function_array(js)

    self.calculated_n = None

cipher.Cipher.__init__=fix

and this worked for me. Remember to import it in a global environment and dont import cipher (or from pytube import *) again

@paulofeh
Copy link

paulofeh commented May 8, 2024

@mahdjourOussama you can substitute pytube's class Cipher with your custom class CustomCipher.

  1. Import pytube's class Cipher into your code
  2. Write your own class by just copy-pasting Cipher and replacing one line of code
  3. Rewrite Cipher with your custom class
from pytube import cipher

class CustomCipher:
    def __init__(self, js: str):
        self.transform_plan: List[str] = get_transform_plan(js)
        # var_regex = re.compile(r"^\w+\W")
        var_regex = re.compile(r"^[\w\$_]+\W")
        # other class code...

cipher.Cipher = CustomCipher

It should work if done properly

working perfectly afeter session reboot in collab

It worked perfectly!

Not for me though somehow, so i made this code

from pytube import cipher
def fix(self, js: str):
    import re #this is added since re isn't imported
    self.transform_plan: List[str] = cipher.get_transform_plan(js) #added "cipher." to every function below to allow it to be called
    var_regex = re.compile(r"^[\w\$_]+\W") #the line 30 modification is done
    var_match = var_regex.search(self.transform_plan[0])
    if not var_match:
        raise RegexMatchError(
            caller="__init__", pattern=var_regex.pattern
        )
    var = var_match.group(0)[:-1]
    self.transform_map = cipher.get_transform_map(js, var)
    self.js_func_patterns = [
        r"\w+\.(\w+)\(\w,(\d+)\)",
        r"\w+\[(\"\w+\")\]\(\w,(\d+)\)"
    ]

    self.throttling_plan = cipher.get_throttling_plan(js)
    self.throttling_array = cipher.get_throttling_function_array(js)

    self.calculated_n = None

cipher.Cipher.__init__=fix

and this worked for me. Remember to import it in a global environment and dont import cipher (or from pytube import *) again

Worked for me, too. Running on Kaggle. Thanks!

@utkxxrsh
Copy link

utkxxrsh commented May 8, 2024

I very quickly migrated to https://github.com/yt-dlp/yt-dlp which seems more mature anyway.

I am using it like this (using a named temporary directory and then returning bytes to be rendered by my request, adjust it to your own needs):

from yt_dlp import YoutubeDL, DownloadError

def download_youtube_mp3(youtube_id, temporary_directory):
    url = f"https://www.youtube.com/watch?v={youtube_id}"
    ydl_opts = {
        'format': 'bestaudio/best',
        "max_filesize": 20 * 1024 * 1024,
         "outtmpl": f"{temporary_directory}/%(id)s.%(ext)s",
        "noplaylist": True,
        "verbose": True,
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }],
    }
    ydl = YoutubeDL(ydl_opts)
    try:
        meta = ydl.extract_info(
            url,
            download=True,
        )
    except DownloadError as e:
        raise e
    else:
        video_id = meta["id"]
        return {
            "title": meta["title"],
            "file_path": f"{temporary_directory}/{video_id}.mp3"
        }
    with tempfile.TemporaryDirectory() as temporary_directory:
        audio_details = download_youtube_mp3(youtube_id, temporary_directory)
        with open(audio_details["file_path"], "rb") as f:
            bytes_data = f.read()
            return bytes_data

thanks, this method works fine and seems more stable

@tdpatil
Copy link

tdpatil commented May 8, 2024

@mahdjourOussama you can substitute pytube's class Cipher with your custom class CustomCipher.

  1. Import pytube's class Cipher into your code
  2. Write your own class by just copy-pasting Cipher and replacing one line of code
  3. Rewrite Cipher with your custom class
from pytube import cipher

class CustomCipher:
    def __init__(self, js: str):
        self.transform_plan: List[str] = get_transform_plan(js)
        # var_regex = re.compile(r"^\w+\W")
        var_regex = re.compile(r"^[\w\$_]+\W")
        # other class code...

cipher.Cipher = CustomCipher

It should work if done properly

I created new file with name custom_cipher & change the class Cipher to CustomCipher after that I initialise the cipher.Cipher = CustomCipher but in docker it is not getting call to my CustomCipher class. Please suggest.

@MammadTavakoli
Copy link

MammadTavakoli commented May 9, 2024 via email

@Gamer-Victorch
Copy link

Gamer-Victorch commented May 9, 2024

@mahdjourOussama you can substitute pytube's class Cipher with your custom class CustomCipher.

  1. Import pytube's class Cipher into your code
  2. Write your own class by just copy-pasting Cipher and replacing one line of code
  3. Rewrite Cipher with your custom class
from pytube import cipher

class CustomCipher:
    def __init__(self, js: str):
        self.transform_plan: List[str] = get_transform_plan(js)
        # var_regex = re.compile(r"^\w+\W")
        var_regex = re.compile(r"^[\w\$_]+\W")
        # other class code...

cipher.Cipher = CustomCipher

It should work if done properly

I created new file with name custom_cipher & change the class Cipher to CustomCipher after that I initialise the cipher.Cipher = CustomCipher but in docker it is not getting call to my CustomCipher class. Please suggest.

Add this at the top of your code and it will override the cipher. I made it because i dont want to modify any codes in pytube package. Make sure the cipher is imported globally (i.e. not inside a sub-program or class but in the top of your code) and it will work.

from pytube import cipher
def fix(self, js: str): #this is  __init__(self, js: str) copied from the class Cipher
    import re #this line is added since re isn't imported
    self.transform_plan: List[str] = cipher.get_transform_plan(js) #added "cipher." to every function below to allow it to be correctly called
    var_regex = re.compile(r"^[\w\$_]+\W") #the line 30 modification is done here
    var_match = var_regex.search(self.transform_plan[0])
    if not var_match:
        raise RegexMatchError(
            caller="__init__", pattern=var_regex.pattern
        )
    var = var_match.group(0)[:-1]
    self.transform_map = cipher.get_transform_map(js, var) #added "cipher."
    self.js_func_patterns = [
        r"\w+\.(\w+)\(\w,(\d+)\)",
        r"\w+\[(\"\w+\")\]\(\w,(\d+)\)"
    ]

    self.throttling_plan = cipher.get_throttling_plan(js) #added "cipher."
    self.throttling_array = cipher.get_throttling_function_array(js) #added "cipher."

    self.calculated_n = None

cipher.Cipher.__init__=fix #override class Cipher's __init__ with my fix

#Your own code starts here...

@mahdjourOussama
Copy link

@mahdjourOussama you can substitute pytube's class Cipher with your custom class CustomCipher.

  1. Import pytube's class Cipher into your code
  2. Write your own class by just copy-pasting Cipher and replacing one line of code
  3. Rewrite Cipher with your custom class
from pytube import cipher

class CustomCipher:
    def __init__(self, js: str):
        self.transform_plan: List[str] = get_transform_plan(js)
        # var_regex = re.compile(r"^\w+\W")
        var_regex = re.compile(r"^[\w\$_]+\W")
        # other class code...

cipher.Cipher = CustomCipher

It should work if done properly

I created new file with name custom_cipher & change the class Cipher to CustomCipher after that I initialise the cipher.Cipher = CustomCipher but in docker it is not getting call to my CustomCipher class. Please suggest.

if you face this issue i think it better that you fork the rep and change line 33 in the cipher file and then instead when downloading from your repo like this
pip install pytube @ git+https://github.com/[username]/pytube
this solution require u to install git in ur docker container before that so add this command first

RUN apt-get update
RUN apt-get -y install git

@davidtwchn
Copy link

A new release is the best solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests