La première chose est de se créer un compte sur
https://www.itv.com/hub/user/signin
l'important est de fournir un code postal anglais valide
https://fr.wikipedia.org/wiki/Codes_postaux_britanniques#Les_codes_au_Royaume-Uni
et un email valide
Ensuite, il faudra un navigateur du genre Chrome avec l'extension Hola afin de prétendre avoir une adresse IP anglaise
Le protocole utilisé par
http://www.itv.com est rtmp, pour récupérer la vidéo on suivra la méthode
https://forum.ubuntu-fr.org/viewtopic.php?id=1459551
Pour les sous-titres, on lancera un ngrep du genre (si votre carte réseau est wlan0)
sudo ngrep -d wlan0 -lqi -p -W none ^get\|^post > a
puis lancer la vidéo et cliquer sur le
S
en bas à droite
Le fichier de sous-titres est un fichier .xml, on cherchera
subtitles
dans le fichier créé par ngrep
exemple
$ grep subtitle a
GET http://subtitles.secure.content.itv.com/crossdomain.xml HTTP/1.1..Host: subtitles.secure.content.itv.com..Proxy-Connection: keep-alive..Proxy-Authorization: Basic dXNlci11dWlkLTM0MzMyNmU5Y2RmZTExOWVjNGE2ZmMzMGI5ZTc4NTY2OjhlZmY5YjAxMGMyMA==..User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.100 Safari/537.36..X-Requested-With: ShockwaveFlash/23.0.0.207..Accept: */*..Referer: http://www.itv.com/hub/blue-murder/Ya2324a0007..Accept-Encoding: gzip, deflate, sdch..Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4..Cookie: Itv.CookiePolicy=accepted; LiwioReferrer=http%3A//www.itv.com/hub/categories; _ga=GA1.2.1528679458.1470295331; __utmt_UA-17825253-53=1; Itv.Region=ITV|null; ABTastySession=LiwioHashMRASN%3Anull%5E%7C%5ELiwioUTMC%3A1; Itv.Session={%22tokens%22:{%22content%22:{%22entitlement%22:{%22purchased%22:[]%2C%22failed_availability_checks%22:[]}%2C%22access_token%22:%22eyJhbGciOiJIUzI1NiJ9.eyJlbnRpdGxlbWVudHMiOltdLCJzdWIiOiJjYmI3YzZlZi1jM2E1LTRlZmMtYjc0ZS1hM2IyZWEyMjIxNGYiLCJicm9hZGNhc3RlciI6IklUViIsInNjb3BlIjoiY29udGVudCIsIm5hbWUiOiJoZW5yaSIsImlzcyI6Imh0dHBzOlwvXC9hdXRoLml0di5jb20iLCJleHAiOjE0ODAyNjYwMDgsIm5vbmNlIjoiajJNUW5welI0VUExVkZxdlk4WVYiLCJpYXQiOjE0ODAxNzk2MDh9.d_oaibE2OAlLhfqrT6KaZFvKbuwrlowKo899n8AALZ4%22%2C%22token_type%22:%22bearer%22%2C%22refresh_token%22:%22eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczpcL1wvYXV0aC5pdHYuY29tIiwic3ViIjoiY2JiN2M2ZWYtYzNhNS00ZWZjLW
GET /crossdomain.xml HTTP/1.1..Host: subtitles.secure.content.itv.com..Connection: keep-alive..User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.100 Safari/537.36..Accept: */*..Accept-Encoding: gzip, deflate, sdch..Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4..Cookie: Itv.CookiePolicy=accepted; LiwioReferrer=http%3A//www.itv.com/hub/categories; _ga=GA1.2.1528679458.1470295331; __utmt_UA-17825253-53=1; Itv.Region=ITV|null; ABTastySession=LiwioHashMRASN%3Anull%5E%7C%5ELiwioUTMC%3A1; Itv.Session={%22tokens%22:{%22content%22:{%22entitlement%22:{%22purchased%22:[]%2C%22failed_availability_checks%22:[]}%2C%22access_token%22:%22eyJhbGciOiJIUzI1NiJ9.eyJlbnRpdGxlbWVudHMiOltdLCJzdWIiOiJjYmI3YzZlZi1jM2E1LTRlZmMtYjc0ZS1hM2IyZWEyMjIxNGYiLCJicm9hZGNhc3RlciI6IklUViIsInNjb3BlIjoiY29udGVudCIsIm5hbWUiOiJoZW5yaSIsImlzcyI6Imh0dHBzOlwvXC9hdXRoLml0di5jb20iLCJleHAiOjE0ODAyNjYwMDgsIm5vbmNlIjoiajJNUW5welI0VUExVkZxdlk4WVYiLCJpYXQiOjE0ODAxNzk2MDh9.d_oaibE2OAlLhfqrT6KaZFvKbuwrlowKo899n8AALZ4%22%2C%22token_type%22:%22bearer%22%2C%22refresh_token%22:%22eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczpcL1wvYXV0aC5pdHYuY29tIiwic3ViIjoiY2JiN2M2ZWYtYzNhNS00ZWZjLWI3NGUtYTNiMmVhMjIyMTRmIiwibm9uY2UiOiJqMk1RbnB6UjRVQTFWRnF2WThZViIsInNjb3BlIjoiY29udGVudCIsImF1dGhfdGltZSI6MTQ3MDI5NTM1MTM0Mn0.VKxivyOSBBJcO61415Zcp9lhTQbgqqfLDjn4qowCmJI%22}}%2C%22sticky%22:true}; ABTasty=LPT122147%3A170983.1470294558%5E%7C%5ELPT122148
GET http://subtitles.secure.content.itv.com/CATCHUP/e08741b0/a26b/4502/ab1a/f59197029995/Y-2324-0007-002_BlueMurder_TX241116.xml?__gda__=1480188001_0e3e3072dc807b88561b814b0fcf166e&fileExt=.xml HTTP/1.1..Host: subtitles.secure.content.itv.com..Proxy-Connection: keep-alive..Proxy-Authorization: Basic dXNlci11dWlkLTM0MzMyNmU5Y2RmZTExOWVjNGE2ZmMzMGI5ZTc4NTY2OjhlZmY5YjAxMGMyMA==..User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.100 Safari/537.36..X-Requested-With: ShockwaveFlash/23.0.0.207..Accept: */*..Referer: http://www.itv.com/hub/blue-murder/Ya2324a0007..Accept-Encoding: gzip, deflate, sdch..Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4..Cookie: Itv.CookiePolicy=accepted; LiwioReferrer=http%3A//www.itv.com/hub/categories; _ga=GA1.2.1528679458.1470295331; __utmt_UA-17825253-53=1; Itv.Region=ITV|null; ABTastySession=LiwioHashMRASN%3Anull%5E%7C%5ELiwioUTMC%3A1; Itv.Session={%22tokens%22:{%22content%22:{%22entitlement%22:{%22purchased%22:[]%2C%22failed_availability_checks%22:[]}%2C%22access_token%22:%22eyJhbGciOiJIUzI1NiJ9.eyJlbnRpdGxlbWVudHMiOltdLCJzdWIiOiJjYmI3YzZlZi1jM2E1LTRlZmMtYjc0ZS1hM2IyZWEyMjIxNGYiLCJicm9hZGNhc3RlciI6IklUViIsInNjb3BlIjoiY29udGVudCIsIm5hbWUiOiJoZW5yaSIsImlzcyI6Imh0dHBzOlwvXC9hdXRoLml0di5jb20iLCJleHAiOjE0ODAyNjYwMDgsIm5vbmNlIjoiajJNUW5welI0VUExVkZxdlk4WVYiLCJpYXQiOjE0ODAxNzk2MDh9.d_oaibE2OAlLhfqrT6KaZFvKbuwrlowKo899n8AALZ4%22%2C%22token_type%22:%22
GET /CATCHUP/e08741b0/a26b/4502/ab1a/f59197029995/Y-2324-0007-002_BlueMurder_TX241116.xml?__gda__=1480188001_0e3e3072dc807b88561b814b0fcf166e&fileExt=.xml HTTP/1.1..Host: subtitles.secure.content.itv.com..Connection: keep-alive..User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.100 Safari/537.36..Accept: */*..Accept-Encoding: gzip, deflate, sdch..Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4..Cookie: Itv.CookiePolicy=accepted; LiwioReferrer=http%3A//www.itv.com/hub/categories; _ga=GA1.2.1528679458.1470295331; __utmt_UA-17825253-53=1; Itv.Region=ITV|null; ABTastySession=LiwioHashMRASN%3Anull%5E%7C%5ELiwioUTMC%3A1; Itv.Session={%22tokens%22:{%22content%22:{%22entitlement%22:{%22purchased%22:[]%2C%22failed_availability_checks%22:[]}%2C%22access_token%22:%22eyJhbGciOiJIUzI1NiJ9.eyJlbnRpdGxlbWVudHMiOltdLCJzdWIiOiJjYmI3YzZlZi1jM2E1LTRlZmMtYjc0ZS1hM2IyZWEyMjIxNGYiLCJicm9hZGNhc3RlciI6IklUViIsInNjb3BlIjoiY29udGVudCIsIm5hbWUiOiJoZW5yaSIsImlzcyI6Imh0dHBzOlwvXC9hdXRoLml0di5jb20iLCJleHAiOjE0ODAyNjYwMDgsIm5vbmNlIjoiajJNUW5welI0VUExVkZxdlk4WVYiLCJpYXQiOjE0ODAxNzk2MDh9.d_oaibE2OAlLhfqrT6KaZFvKbuwrlowKo899n8AALZ4%22%2C%22token_type%22:%22bearer%22%2C%22refresh_token%22:%22eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczpcL1wvYXV0aC5pdHYuY29tIiwic3ViIjoiY2JiN2M2ZWYtYzNhNS00ZWZjLWI3NGUtYTNiMmVhMjIyMTRmIiwibm9uY2UiOiJqMk1RbnB6UjRVQTFWRnF2WThZViIsInNjb3BlIjoiY29udGVudCIsImF1dGhfdGltZSI6MTQ3MDI5NTM1
On lance donc un wget du genre
$ wget -O subtitles.xml http://subtitles.secure.content.itv.com/CATCHUP/e08741b0/a26b/4502/ab1a/f59197029995/Y-2324-0007-002_BlueMurder_TX241116.xml
--2016-11-26 18:23:23-- http://subtitles.secure.content.itv.com/CATCHUP/e08741b0/a26b/4502/ab1a/f59197029995/Y-2324-0007-002_BlueMurder_TX241116.xml
Résolution de subtitles.secure.content.itv.com (subtitles.secure.content.itv.com)… 2.16.117.67, 2.16.117.57, 2.16.117.66, ...
Connexion à subtitles.secure.content.itv.com (subtitles.secure.content.itv.com)|2.16.117.67|:80… connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 485162 (474K) [text/xml]
Enregistre : «subtitles.xml»
subtitles.xml 100%[======================================================================================================>] 473,79K 144KB/s ds 3,3s
2016-11-26 18:23:27 (144 KB/s) - «subtitles.xml» enregistré [485162/485162]
$ ll subtitles.xml
-rw-rw-r-- 1 gg gg 485162 nov. 23 23:16 subtitles.xml
$
Ce fichier .xml est du genre
<?xml version="1.0" encoding="UTF-16"?>
<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" ttp:frameRate="25">
<head><styling><style id="Swift" tts:color="white" tts:textAlign="center" tts:extent="720px 576px" tts:fontFamily="Courier New" tts:fontSize="18"/></styling></head>^M
<body>
<div style ="Swift">
<p xml:id="0" xml:space="preserve" begin="00:01:06:00" end="00:01:08:14" tts:backgroundColor="black" tts:fontSize="18px" tts:origin="0px 21px">Spartacus, come on, boy!</p>
<p xml:id="1" xml:space="preserve" begin="00:01:10:16" end="00:01:12:16" tts:backgroundColor="black" tts:fontSize="18px" tts:origin="0px 21px">Spartacus, come on!</p>
<p xml:id="2" xml:space="preserve" begin="00:01:15:02" end="00:01:17:08" tts:backgroundColor="black" tts:fontSize="18px" tts:origin="0px 21px">Come on, boy. Spartacus! Here, boy.</p>
<p xml:id="3" xml:space="preserve" begin="00:01:20:00" end="00:01:22:19" tts:backgroundColor="black" tts:fontSize="18px" tts:origin="0px 21px">Paula! Where are you?</p>
<p xml:id="4" xml:space="preserve" begin="00:01:24:04" end="00:01:26:24" tts:backgroundColor="black" tts:fontSize="18px" tts:origin="0px 21px">Spartacus? Here, boy!</p>
<p xml:id="5" xml:space="preserve" begin="00:01:27:00" end="00:01:30:17" tts:backgroundColor="black" tts:fontSize="18px" tts:origin="0px 21px">Paula! Paula, where are you?</p>
<p xml:id="6" xml:space="preserve" begin="00:01:30:18" end="00:01:32:18" tts:backgroundColor="black" tts:fontSize="18px" tts:origin="0px 21px">Hello?</p>
Un petit script Python convertit ce fichier en un fichier de sous-titres (.srt),
J'ai galéré à cause du Byte Order Mark
https://fr.wikipedia.org/wiki/Indicateur_d'ordre_des_octets
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import io
import chardet
import codecs
import sys
import os
from lxml import objectify, etree
import re
from BeautifulSoup import BeautifulStoneSoup
bytes = min(32, os.path.getsize(sys.argv[1]))
raw = open(sys.argv[1], 'rb').read(bytes)
if raw.startswith(codecs.BOM_UTF8):
encoding = 'utf-8-sig'
else:
result = chardet.detect(raw)
encoding = result['encoding']
#print encoding
#encoding = "UTF-16LE"
infile = codecs.open(sys.argv[1], 'r', encoding=encoding)
#infile = open(sys.argv[1], 'r')
#root = objectify.parse(infile)
"""
data = infile.read()
print repr(data[0:2])
data = data[2:]
data = data.decode(encoding).encode(encoding)
infile.close()
print(data[0:20])
print objectify.fromstring.__doc__
root = objectify.fromstring(data)
for i in range(6):
infile.readline()
lines = infile.readlines()
for line in lines:
m = re.search(r'xml:id="(?P<id>\d+)" .*begin="(?P<begin>[0-9,\:]+)" .*end="(?P<end>[0-9,\:]+)"', line)
if m is not None:
print m.group('id')
print m.group('begin'), '-->',m.group('end')
soup = BeautifulStoneSoup(line)
for x in soup.p.findAll('span'):
print x.string," "
print
On lance le script, disons que je l'ai appelé itv.py, en passant en paramètre le fichier de sous-titres .xml récupéré
$ python bbc/itv.py subtitles.xml
ce qui affiche à la fin un truc du genre
933
01:07:30:17 --> 01:07:33:24
When's the new nanny starting?
934
01:07:34:00 --> 01:07:36:00
935
01:07:36:01 --> 01:07:39:09
If she doesn't I'm out of here.
Where you going?!
Pub.
936
01:07:39:10 --> 01:07:41:23
Hi, Mum.
Hello, sweetheart.
937
01:07:41:24 --> 01:07:45:03
His new girlfriend.
Ooh.
938
01:07:45:04 --> 01:07:49:09
I'm
still trying to get on this date.
939
01:07:49:10 --> 01:07:52:04
Bye.
940
01:07:52:05 --> 01:07:56:17
941
01:07:56:18 --> 01:07:59:22
942
01:08:00:24 --> 01:08:02:24
et pour avoir le fichier de sous-titres
$ python bbc/itv.py subtitles.xml > maserie_s03e01.srt
Merci de m'avoir lu jusque là