Goodbye Netflix
Wow. I just checked, and I've had Netflix since 08/10/2001. Over thirteen years. Longer than my marriage. Two houses ago. I'm down to the cheapest one-at-a-time plan, and I still get around to it every three or four months.
I think it's time to say goodbye.
But here's how they get you to stay:
Based on your 1698 ratings, this is the list of movies and TV shows you've seen.
Yeah… thirteen and a half years of data that I don't want to lose! And that's my main account - I have two other profiles too. I searched the 'net for a solution, and came up with a lot. None worked. GreaseMonkey ones. PHP ones. None worked.
This was the closest: https://gist.github.com/tloredo/8483682
But I don't have a Mac, so I needed to manually capture that info. Ninety pages of ratings. So I used DownThemAll!. I opened the download manager manually, and for the URL I used http://dvd.netflix.com/MoviesYouveSeen?pageNum=[1:90]
- I had manually determined 90 with some trial and error. This saved all the pages to files named MoviesYouveSeen.htm
and then MoviesYouveSeen_NNN.htm
.
I modified the script to read these HTML files instead of launching Safari. After that, the ratings were off - every movie in the file would have the rating of the first in the file. So I tweaked that. For some reason, some don't show a rating in the HTML, even when these were supposedly rated. Some are "No Interest," but others, I just don't know what happened. So I have it output 0.0
if it couldn't figure it out - a 99% solution.
Here are my changes from the gitlab (17 Jan 2014) version (depending on screen width, you might have to scroll way down):
-
.py
old new 1 #!/bin/env python 2 # Original @ https://gist.github.com/tloredo/8483682 1 3 """ 2 4 Scrape a user's Netflix movie ratings by automating a Safari browsing 3 5 session (with the user already logged in). The ratings are written … … 106 108 107 109 from jinja2 import Template 108 110 from lxml import html 111 import re 112 113 fname_regex = re.compile(r'(\w+?)_?(\d+)?\.(\w+)') 114 rating_regex = re.compile(r'You rated this movie: (\d)\.(\d)') 109 115 110 116 111 117 # AppleScript functions asrun and asquote (presently unused) are from: … … 159 165 All values are strings. 160 166 """ 161 167 # Load the page, grab the HTML, and parse it to a tree. 162 script = ASTemplate.render(URL=url, DTIME=dtime) 163 reply = asrun(script) 168 reply = '' 169 try: 170 with open(url) as infile: 171 for str_ in infile: 172 reply += str_ 173 except IOError: 174 return [], None 175 176 164 177 tree = html.fromstring(reply) 165 178 rows = tree.xpath('//table[@class="listHeader"]//tr') 166 179 … … 180 193 # changing from page to page. For info on XPath for such cases, see: 181 194 # http://stackoverflow.com/questions/8808921/selecting-a-css-class-with-xpath 182 195 # rating = data[3].xpath('//span[@class="stbrMaskFg sbmfrt sbmf-50"]')[0].text_content() 183 rating = data[3].xpath('//span[contains(concat(" ", normalize-space(@class), " "), " stbrMaskFg ")]')[0].text_content() 184 rating = rating.split(':')[1].strip() # keep only the number 196 rating_cut = rating_regex.match(data[3].text_content()) 197 rating = '0.0' 198 if rating_cut: 199 rating = "%s.%s"%(rating_cut.group(1), rating_cut.group(2)) 200 185 201 info.append((title, year, genre, rating)) 186 202 187 203 # Next URL to load: 188 next_elem = tree.xpath('//li[@class="navItem paginationLink paginationLink-next"]/a') 189 if next_elem: 190 next_url = next_elem[0].get('href') 191 else: # empty list 192 next_url = None 204 fname_cut = fname_regex.match(url) 205 if fname_cut: 206 if None == fname_cut.group(2): 207 num = 0 208 else: 209 num = fname_cut.group(2) 210 next_url = "%s_%03.f.%s"%(fname_cut.group(1),int(num)+1,fname_cut.group(3)) 211 else: 212 print "Regex failed." 213 next_url = None 214 193 215 194 216 return info, next_url 195 217 196 218 197 219 # Use this initial URL for DVD accounts: 198 url = ' http://dvd.netflix.com/MoviesYouveSeen'220 url = 'MoviesYouveSeen.htm' 199 221 # Use this initial URL for streaming accounts: 200 222 # url = 'http://movies.netflix.com/MoviesYouveSeen' 201 223
This renders a lot of the script useless, but there's no benefit in making the diff
larger so I didn't trim anything else.
Here's when I ran it across my "TV Queue" account - yeah they're not all TV, sometimes I accidentally rated things with the wrong profile:
$ ./ScrapeNetflixRatings.py Scraping MoviesYouveSeen.htm 1: Garmin Streetpilot 2610/2650 GPS (2003) [Special Interest] - 1.0 2: Six Feet Under (2001) [Television] - 0.0 Scraping MoviesYouveSeen_001.htm 3: The Thief of Bagdad (1924) [Classics] - 4.0 4: The Tick (2001) [Television] - 4.0 5: Michael Palin: Pole to Pole (1992) [Documentary] - 0.0 6: Kung Fu: Season 3 (1974) [Television] - 0.0 7: Danger Mouse (1981) [Children & Family] - 3.0 8: Farscape (1999) [Television] - 3.0 9: Helvetica (2007) [Documentary] - 3.0 10: Hogan's Heroes (1965) [Television] - 3.0 11: The Lion in Winter (2003) [Drama] - 3.0 12: Monty Python: John Cleese's Best (2005) [Television] - 3.0 13: Sarah Silverman: Jesus Is Magic (2005) [Comedy] - 3.0 14: Stephen King's It (1990) [Horror] - 3.0 15: Superman II (1980) [Action & Adventure] - 3.0 16: Superman: The Movie (1978) [Classics] - 3.0 17: Tom Brown's Schooldays (1951) [Drama] - 3.0 18: An Evening with Kevin Smith 2 (2006) [Comedy] - 0.0 19: Crimewave (1986) [Comedy] - 2.0 20: Huff (2004) [Television] - 2.0 21: Aqua Teen Hunger Force (2000) [Television] - 1.0 22: The Boondocks (2005) [Television] - 1.0 Scraping MoviesYouveSeen_002.htm 23: Ricky Gervais: Out of England (2008) [Comedy] - 5.0 24: Robot Chicken (2005) [Television] - 5.0 25: Robot Chicken Star Wars (2007) [Comedy] - 5.0 26: Rome (2005) [Television] - 5.0 27: Scrubs (2001) [Television] - 5.0 28: Stewie Griffin: The Untold Story (2005) [Television] - 5.0 29: Spaced: The Complete Series (1999) [Television] - 0.0 30: Alice (2009) [Sci-Fi & Fantasy] - 0.0 31: Best of the Chris Rock Show: Vol. 1 (1999) [Television] - 4.0 32: The Critic: The Complete Series (1994) [Television] - 4.0 33: Dilbert (1999) [Television] - 4.0 34: An Evening with Kevin Smith (2002) [Comedy] - 4.0 35: John Adams (2008) [Drama] - 4.0 36: King of the Hill (1997) [Television] - 4.0 37: The Lone Gunmen: The Complete Series (2001) [Television] - 4.0 38: Neverwhere (1996) [Sci-Fi & Fantasy] - 4.0 39: Robin Hood (2006) [Television] - 4.0 40: The Sand Pebbles (1966) [Classics] - 4.0 41: The Sarah Silverman Program (2007) [Television] - 4.0 42: The Silence of the Lambs (1991) [Thrillers] - 4.0 Scraping MoviesYouveSeen_003.htm 43: Alias (2001) [Television] - 5.0 44: Alien (1979) [Sci-Fi & Fantasy] - 5.0 45: Band of Brothers (2001) [Drama] - 5.0 46: Bleak House (2005) [Drama] - 5.0 47: Brisco County, Jr.: Complete Series (1993) [Television] - 5.0 48: Code Monkeys (2007) [Television] - 5.0 49: Coupling (2000) [Television] - 5.0 50: Dead Like Me (2003) [Television] - 5.0 51: Deadwood (2004) [Television] - 5.0 52: Family Guy (1999) [Television] - 5.0 53: Family Guy: Blue Harvest (2007) [Television] - 5.0 54: Firefly (2002) [Television] - 5.0 55: Futurama (1999) [Television] - 5.0 56: Futurama the Movie: Bender's Big Score (2007) [Television] - 5.0 57: The Great Escape (1963) [Classics] - 5.0 58: Greg the Bunny (2002) [Television] - 5.0 59: How I Met Your Mother (2005) [Television] - 5.0 60: MI-5 (2002) [Television] - 5.0 61: My Name Is Earl (2005) [Television] - 5.0 62: Police Squad!: The Complete Series (1982) [Television] - 5.0 Scraping MoviesYouveSeen_004.htm
Thanks a ton to the original author, and the full version is attached here for posterity.
Attachments (1)
- ScrapeNetflixRatings.py (8.9 KB) - added by 10 years ago.
Download all attachments as: .zip
Comments
No comments.