-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RegexMatchError: __init__: could not find match for ^\w+\W #1918
Comments
Thank you for contributing to PyTube. Please remember to reference Contributing.md |
Same issue here, worked earlier today, and had this error for the past few hours over different machines. Seems youtube may have changed something to break PyTube. |
same issue here! Works fine till yesterday (from my side): venv/lib/python3.10/site-packages/pytube/cipher.py", line 33, in init |
same issue here! It seems like there's an issue with pytube. I've been using this library to work on my graduation project, and I would greatly appreciate it if you could fix it as soon as possible. It's really urgent. |
+1 on this issue |
+1 today |
my graduation project discussion is in 4 days please fix it as soon as possible |
+1 on this |
+1, let's thank youtube for changing their function names 🙏 |
Same error here:
My logs in docker. |
just as a suggestion, my playlist is only like 2+hours so i downloaded all of it and bypass pytube for now. Work have to be done and i cant wait any longer. |
I very quickly migrated to https://github.com/yt-dlp/yt-dlp which seems more mature anyway. I am using it like this (using a named temporary directory and then returning bytes to be rendered by my request, adjust it to your own needs): from yt_dlp import YoutubeDL, DownloadError
def download_youtube_mp3(youtube_id, temporary_directory):
url = f"https://www.youtube.com/watch?v={youtube_id}"
ydl_opts = {
'format': 'bestaudio/best',
"max_filesize": 20 * 1024 * 1024,
"outtmpl": f"{temporary_directory}/%(id)s.%(ext)s",
"noplaylist": True,
"verbose": True,
'postprocessors': [{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'mp3',
'preferredquality': '192',
}],
}
ydl = YoutubeDL(ydl_opts)
try:
meta = ydl.extract_info(
url,
download=True,
)
except DownloadError as e:
raise e
else:
video_id = meta["id"]
return {
"title": meta["title"],
"file_path": f"{temporary_directory}/{video_id}.mp3"
} with tempfile.TemporaryDirectory() as temporary_directory:
audio_details = download_youtube_mp3(youtube_id, temporary_directory)
with open(audio_details["file_path"], "rb") as f:
bytes_data = f.read()
return bytes_data |
+1 |
Found a quick fix - turns out current function name includes "$O", a non-word character which will not match "\w+". Edited to include @ghrist8p's comment: as per ECMA's spec, valid identifiers may include symbols "$" and "_" in addition to word-characters. Change cipher.py line 30 to:
I do not know all the possible patterns, but this is working for me. |
Worked. |
how would I get it to work in Colab? |
As per the spec an identifier can have a |
Thank you for your help! I'm also looking forward to official updates from the program side. |
bless you |
Where did you enter var_regex = re.compile(r"^[\w$_]+\W")? It wasn't in the pytube code I was using. I'm getting an error in this code:
|
cipher.py on line 30. You can check my repo commit. |
Thank you for your help. |
i fix this bug today. |
+1 |
I've tried all of the above codes, but they don't work. They worked until yesterday. Working on Colab.
|
@coco1718 var_regex = re.compile(r"^[\w$_]+\W") still works for me somehow |
I think there's another problem, but I can't find it |
I solved with this code: import requests
def download_stream(url, filename):
"""
Simulates downloading a video stream from a URL and saves it to a file.
Args:
url: The URL of the video stream.
filename: The name of the file to save the video to.
"""
# Set chunk size for efficient downloading
chunk_size = 1024
try:
response = requests.get(url, stream=True)
response.raise_for_status() # Raise an exception for failed requests
# Check for content type (video or not)
if not response.headers['Content-Type'].startswith('video/'):
print(f"Warning: Content type '{response.headers['Content-Type']}' might not be a video.")
with open(filename, 'wb') as f:
for chunk in response.iter_content(chunk_size):
# Filter out keep-alive new chunks
if chunk:
f.write(chunk)
print(f"Download complete! Saved to: {filename}")
except requests.exceptions.RequestException as e:
print(f"Error downloading video: {e}")
# Function to download a video from YouTube using pytube
def download_video_by_pytube(url, base_path="//content//drive//MyDrive//YouTube//Download_video//",
playlist_start=1,
dl_pytube_stream=True,
skip_download=False,
max_resolution=1280,
target_languages=["en"], translated_lang=[]):
# Check if the URL is for a playlist
if 'list=' in url.lower():
playlist = Playlist(url)
folder_name = f"{playlist.title}_{playlist.owner}"
playlist = list(playlist.video_urls)
else:
playlist = []
playlist.append(url)
youtube = YouTube(url)
folder_name = f"{youtube.title}_{youtube.author}"
for i, video_url in enumerate(playlist[playlist_start-1:]):
youtube = YouTube(video_url)
# Get the name of the video
title = youtube.title.translate(table_punctuation).translate(table_digit)
j = str(i+playlist_start).zfill(3)
filename = f"{j}_{title}"
print(f"{int(j)}-th from {len(playlist)} : {title}")
filename=f"{filename}.mp4"
try:
if not dl_pytube_stream:
video_dict = {}
for streaming_data in youtube.streaming_data['formats']:
if 'video/mp4;' in streaming_data['mimeType']:
video_dict[streaming_data['url']] = int(streaming_data['qualityLabel'].split('p')[0])
video_dict = filtered_dict = dict(filter(lambda x: x[1] <= max_resolution, video_dict.items()))
videos = sorted(video_dict.items(), key=lambda x: x[1], reverse=True)
# print(videos[0][0])
download_stream(videos[0][0], filename)
else:
# Get the streams that have a resolution
streams = [stream for stream in youtube.streams.filter(progressive=True, file_extension='mp4') if stream.resolution]
print('************************')
# Sort the streams by resolution in descending order
streams.sort(key=lambda stream: int(stream.resolution.replace('p', '')), reverse=True)
# Find the first stream that has a resolution less than or equal to max_resolution
stream = next((stream for stream in streams if int(stream.resolution.replace('p', '')) <= max_resolution), None)
# Download the video and subtitle and specify the filename
if stream:
if not skip_download:
stream.download(filename)
print("Downloaded resolution is:", stream.resolution)
print('_'*50)
except Exception as e:
print(e) you must set |
+1 |
No problema, @mahdjourOussama ⭐ |
thanks alot! finally got it working |
please merge the fix |
+1 can confirm bug |
+1 I'm getting the same bug in collab as well. |
+1 I'm getting the same bug in collab |
If you just want to monkey patch in the fix, put this file somewhere and then import it before you use pytube. It's a copy of the normal pytube Most of the file isn't necessary, and is just there because it's a straight copy of |
working perfectly afeter session reboot in collab |
For those who are trying to solve this in their Docker Container (I suppose it works for Google Collab as well but I'm not sure):
|
Updated the following code: Comment out line # 30: var_regex = re.compile(r"^\w+\W") Added following line: var_regex = re.compile(r"^[\w$_]+\W") This fixes the following issue: pytube#1918
RegexMatchError: __init__: could not find match for ^\w+\W - see pytube#1918
It worked perfectly! |
Not for me though somehow, so i made this code from pytube import cipher
def fix(self, js: str):
import re #this is added since re isn't imported
self.transform_plan: List[str] = cipher.get_transform_plan(js) #added "cipher." to every function below to allow it to be called
var_regex = re.compile(r"^[\w\$_]+\W") #the line 30 modification is done
var_match = var_regex.search(self.transform_plan[0])
if not var_match:
raise RegexMatchError(
caller="__init__", pattern=var_regex.pattern
)
var = var_match.group(0)[:-1]
self.transform_map = cipher.get_transform_map(js, var)
self.js_func_patterns = [
r"\w+\.(\w+)\(\w,(\d+)\)",
r"\w+\[(\"\w+\")\]\(\w,(\d+)\)"
]
self.throttling_plan = cipher.get_throttling_plan(js)
self.throttling_array = cipher.get_throttling_function_array(js)
self.calculated_n = None
cipher.Cipher.__init__=fix and this worked for me. Remember to import it in a global environment and dont import cipher (or from pytube import *) again |
Worked for me, too. Running on Kaggle. Thanks! |
thanks, this method works fine and seems more stable |
I created new file with name custom_cipher & change the class Cipher to CustomCipher after that I initialise the cipher.Cipher = CustomCipher but in docker it is not getting call to my CustomCipher class. Please suggest. |
سلام
…On Wed, May 8, 2024 at 12:27 AM Renata Leite ***@***.***> wrote:
+1 I'm getting the same bug in collab
—
Reply to this email directly, view it on GitHub
<#1918 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC5FUQQ532IVBWWAWJ5LEJDZBE54TAVCNFSM6AAAAABHJUWU26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJZGI4TQNZXGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
***@***.***
|
Add this at the top of your code and it will override the cipher. I made it because i dont want to modify any codes in pytube package. Make sure the cipher is imported globally (i.e. not inside a sub-program or class but in the top of your code) and it will work. from pytube import cipher
def fix(self, js: str): #this is __init__(self, js: str) copied from the class Cipher
import re #this line is added since re isn't imported
self.transform_plan: List[str] = cipher.get_transform_plan(js) #added "cipher." to every function below to allow it to be correctly called
var_regex = re.compile(r"^[\w\$_]+\W") #the line 30 modification is done here
var_match = var_regex.search(self.transform_plan[0])
if not var_match:
raise RegexMatchError(
caller="__init__", pattern=var_regex.pattern
)
var = var_match.group(0)[:-1]
self.transform_map = cipher.get_transform_map(js, var) #added "cipher."
self.js_func_patterns = [
r"\w+\.(\w+)\(\w,(\d+)\)",
r"\w+\[(\"\w+\")\]\(\w,(\d+)\)"
]
self.throttling_plan = cipher.get_throttling_plan(js) #added "cipher."
self.throttling_array = cipher.get_throttling_function_array(js) #added "cipher."
self.calculated_n = None
cipher.Cipher.__init__=fix #override class Cipher's __init__ with my fix
#Your own code starts here... |
if you face this issue i think it better that you fork the rep and change line 33 in the cipher file and then instead when downloading from your repo like this
|
A new release is the best solution. |
I have changed the cipher.py as suggested by many past issues but it doesnt fix it for me.
The text was updated successfully, but these errors were encountered: