Cross-Service YT-DLP Scraping Util

rainbownapkin commented

2025-05-04 23:23:00 -04:00

(Migrated from gitlab.com)

Create a local YT-DLP Metadata Scraping Util.

Unfortunately reliably working piped/invidious instances with public IP's have pretty much vanished, the few that remain have disabled API and added captchas in desperation.

This is our quickest/easiest option for pulling YT data ourselves without registering some bullshit API key.

Otherwise, if our IP gets banned by google, we may have to resort to hosting our own piped/invidious instance locally(either on machine or LAN), while running it off of a separate global IP. Though it would still be good for other sources, or smaller canopy instances which might slip through the cracks.

Create a local YT-DLP Metadata Scraping Util. Unfortunately reliably working piped/invidious instances with public IP's have pretty much vanished, the few that remain have disabled API and added captchas in desperation. This is our quickest/easiest option for pulling YT data ourselves without registering some bullshit API key. Otherwise, if our IP gets banned by google, we may have to resort to hosting our own piped/invidious instance locally(either on machine or LAN), while running it off of a separate global IP. Though it would still be good for other sources, or smaller canopy instances which might slip through the cracks.

rainbownapkin commented

2025-05-04 23:23:00 -04:00

(Migrated from gitlab.com)

added #124 as parent issue

rainbownapkin commented

2025-05-04 23:24:19 -04:00

(Migrated from gitlab.com)

changed the description

rainbownapkin (Migrated from gitlab.com) closed this issue

2025-05-06 07:50:16 -04:00

rainbownapkin commented

2025-05-06 07:50:17 -04:00

(Migrated from gitlab.com)

Canopy Custom Media Object Creation from YT-DLP partially-implemented: 0ce0685fd5

Still need to refresh on queue from playlist, or upon playback if it was queued more than a couple of minutes ago.

Also need to fix queueing functions to wait until after media metadata has arrived to set the start date if starting now.

Canopy Custom Media Object Creation from YT-DLP partially-implemented: 0ce0685fd558e2f3ac1024a0bab3980463e6fded Still need to refresh on queue from playlist, or upon playback if it was queued more than a couple of minutes ago. Also need to fix queueing functions to wait until after media metadata has arrived to set the start date if starting now.

rainbownapkin (Migrated from gitlab.com) reopened this issue

2025-05-06 07:50:35 -04:00

rainbownapkin commented

2025-05-06 07:57:16 -04:00

(Migrated from gitlab.com)

mentioned in task #127

rainbownapkin commented

2025-05-06 07:57:49 -04:00

(Migrated from gitlab.com)

changed title from YTDLP Backup Puller to YT-DLP Scraping Util

changed title from <code class="idiff">YTDLP Backup Puller</code> to <code class="idiff">YT-DLP Scraping Util</code>

rainbownapkin commented

2025-05-06 07:57:58 -04:00

(Migrated from gitlab.com)

changed title from YT-DLP Scraping Util to Cross-Service YT-DLP Scraping Util

changed title from <code class="idiff">YT-DLP Scraping Util</code> to <code class="idiff">Cross-Service YT-DLP Scraping Util</code>

rainbownapkin commented

2025-05-06 07:58:27 -04:00

(Migrated from gitlab.com)

changed the description

rainbownapkin commented

2025-05-06 08:01:50 -04:00

(Migrated from gitlab.com)

changed the description

rainbownapkin commented

2025-05-06 08:02:43 -04:00

(Migrated from gitlab.com)

changed the description

rainbownapkin commented

2025-05-06 08:02:58 -04:00

(Migrated from gitlab.com)

changed the description

rainbownapkin commented

2025-05-06 08:03:10 -04:00

(Migrated from gitlab.com)

changed the description

rainbownapkin commented

2025-05-06 21:15:41 -04:00

(Migrated from gitlab.com)

Youtube videos now refresh raw file link 10 seconds before playback starts to ensure it's ready to be sent to the client. In the advent the pull takes longer than 10 seconds, the update will be sent out live to the clients so they can hotswap the video and play the remainder: 60cd21d938

Youtube videos now refresh raw file link 10 seconds before playback starts to ensure it's ready to be sent to the client. In the advent the pull takes longer than 10 seconds, the update will be sent out live to the clients so they can hotswap the video and play the remainder: 60cd21d9381efc9e7a4bb12442a6e171d82d4d86

rainbownapkin commented

2025-05-06 21:23:45 -04:00

(Migrated from gitlab.com)

Still need to refresh raw links for:

Items re-hydrated from playlist or after server restart that begin in less than 10 seconds
Items re-hydrated after server restart that begin in less than 10 seconds or have already started
Items which where already queued, and then moved to start within 10 seconds

-- OR --

Re-write refreshRawLink() to check link expiration to only refresh when required, then add refreshRawLink() call to items scheduled to start within 10 seconds.

This last option is probably the cleanest, and solves other issues while also reducing the amount of youtube scrapes.

Still need to refresh raw links for: - Items re-hydrated from playlist or after server restart that begin in less than 10 seconds - Items re-hydrated after server restart that begin in less than 10 seconds or have already started - Items which where already queued, and then moved to start within 10 seconds -- OR -- - Re-write refreshRawLink() to check link expiration to only refresh when required, then add refreshRawLink() call to items scheduled to start within 10 seconds. This last option is probably the cleanest, and solves other issues while also reducing the amount of youtube scrapes.

rainbownapkin (Migrated from gitlab.com) closed this issue

2025-05-06 22:47:54 -04:00

rainbownapkin commented

2025-05-06 22:47:55 -04:00

(Migrated from gitlab.com)

refreshRawLink() now checks link expiration to make sure link actually requires a refresh, scheduleMedia() refreshes any items that start before the preSwitchDelta: 2a3740dece

Looks like youtube scraping for YT-DLP is complete. I'll play around with Dailymotion and Vimeo before continuing...

refreshRawLink() now checks link expiration to make sure link actually requires a refresh, scheduleMedia() refreshes any items that start before the preSwitchDelta: 2a3740dece054117e7e89103207b42efa7110e11 Looks like youtube scraping for YT-DLP is complete. I'll play around with Dailymotion and Vimeo before continuing...

rainbownapkin (Migrated from gitlab.com) reopened this issue

2025-05-06 22:48:44 -04:00

rainbownapkin commented

2025-05-06 23:32:06 -04:00

(Migrated from gitlab.com)

Dailymotion looks workable, though we'll need to get HLS playback working before we can get it playing proper. This would also benefit youtube playback, on top of being a requirement for livestreams, so that'll be a big improvement.

Vimeo seems to pull unreliably from YT-DLP, however it has an API that works just fine anonymously which should at least be enough to get their official embed working...

Dailymotion looks workable, though we'll need to get HLS playback working before we can get it playing proper. This would also benefit youtube playback, on top of being a requirement for livestreams, so that'll be a big improvement. Vimeo seems to pull unreliably from YT-DLP, however it has an API that works just fine anonymously which should at least be enough to get their official embed working...

rainbownapkin (Migrated from gitlab.com) closed this issue

2025-05-06 23:32:11 -04:00

Cross-Service YT-DLP Scraping Util #126