Posts for the month of January 2015

Goodbye Netflix

Wow. I just checked, and I've had Netflix since 08/10/2001. Over thirteen years. Longer than my marriage. Two houses ago. I'm down to the cheapest one-at-a-time plan, and I still get around to it every three or four months.

I think it's time to say goodbye.

But here's how they get you to stay:

Based on your 1698 ratings, this is the list of movies and TV shows you've seen. 

Yeah… thirteen and a half years of data that I don't want to lose! And that's my main account - I have two other profiles too. I searched the 'net for a solution, and came up with a lot. None worked. GreaseMonkey ones. PHP ones. None worked.

This was the closest: https://gist.github.com/tloredo/8483682

But I don't have a Mac, so I needed to manually capture that info. Ninety pages of ratings. So I used DownThemAll!. I opened the download manager manually, and for the URL I used http://dvd.netflix.com/MoviesYouveSeen?pageNum=[1:90] - I had manually determined 90 with some trial and error. This saved all the pages to files named MoviesYouveSeen.htm and then MoviesYouveSeen_NNN.htm.

I modified the script to read these HTML files instead of launching Safari. After that, the ratings were off - every movie in the file would have the rating of the first in the file. So I tweaked that. For some reason, some don't show a rating in the HTML, even when these were supposedly rated. Some are "No Interest," but others, I just don't know what happened. So I have it output 0.0 if it couldn't figure it out - a 99% solution.

Here are my changes from the gitlab (17 Jan 2014) version (depending on screen width, you might have to scroll way down):

  • .py

    old new  
     1#!/bin/env python
     2# Original @ https://gist.github.com/tloredo/8483682
    13"""
    24Scrape a user's Netflix movie ratings by automating a Safari browsing
    35session (with the user already logged in).  The ratings are written
     
    106108
    107109from jinja2 import Template
    108110from lxml import html
     111import re
     112
     113fname_regex = re.compile(r'(\w+?)_?(\d+)?\.(\w+)')
     114rating_regex = re.compile(r'You rated this movie: (\d)\.(\d)')
    109115
    110116
    111117# AppleScript functions asrun and asquote (presently unused) are from:
     
    159165    All values are strings.
    160166    """
    161167    # Load the page, grab the HTML, and parse it to a tree.
    162     script = ASTemplate.render(URL=url, DTIME=dtime)
    163     reply = asrun(script)
     168    reply = ''
     169    try:
     170      with open(url) as infile:
     171        for str_ in infile:
     172          reply += str_
     173    except IOError:
     174      return [], None
     175
     176
    164177    tree = html.fromstring(reply)
    165178    rows = tree.xpath('//table[@class="listHeader"]//tr')
    166179
     
    180193            # changing from page to page.  For info on XPath for such cases, see:
    181194            # http://stackoverflow.com/questions/8808921/selecting-a-css-class-with-xpath
    182195            # rating = data[3].xpath('//span[@class="stbrMaskFg sbmfrt sbmf-50"]')[0].text_content()
    183             rating = data[3].xpath('//span[contains(concat(" ", normalize-space(@class), " "), " stbrMaskFg ")]')[0].text_content()
    184             rating = rating.split(':')[1].strip()  # keep only the number
     196            rating_cut = rating_regex.match(data[3].text_content())
     197            rating = '0.0'
     198            if rating_cut:
     199               rating = "%s.%s"%(rating_cut.group(1), rating_cut.group(2))
     200
    185201            info.append((title, year, genre, rating))
    186202
    187203    # Next URL to load:
    188     next_elem = tree.xpath('//li[@class="navItem paginationLink paginationLink-next"]/a')
    189     if next_elem:
    190         next_url = next_elem[0].get('href')
    191     else:  # empty list
    192         next_url = None
     204    fname_cut = fname_regex.match(url)
     205    if fname_cut:
     206      if None == fname_cut.group(2):
     207        num = 0
     208      else:
     209        num = fname_cut.group(2)
     210      next_url = "%s_%03.f.%s"%(fname_cut.group(1),int(num)+1,fname_cut.group(3))
     211    else:
     212      print "Regex failed."
     213      next_url = None
     214
    193215
    194216    return info, next_url
    195217
    196218
    197219# Use this initial URL for DVD accounts:
    198 url = 'http://dvd.netflix.com/MoviesYouveSeen'
     220url = 'MoviesYouveSeen.htm'
    199221# Use this initial URL for streaming accounts:
    200222# url = 'http://movies.netflix.com/MoviesYouveSeen'
    201223

This renders a lot of the script useless, but there's no benefit in making the diff larger so I didn't trim anything else.

Here's when I ran it across my "TV Queue" account - yeah they're not all TV, sometimes I accidentally rated things with the wrong profile:

$ ./ScrapeNetflixRatings.py
Scraping MoviesYouveSeen.htm
1:  Garmin Streetpilot 2610/2650 GPS (2003) [Special Interest] - 1.0
2:  Six Feet Under (2001) [Television] - 0.0

Scraping MoviesYouveSeen_001.htm
3:  The Thief of Bagdad (1924) [Classics] - 4.0
4:  The Tick (2001) [Television] - 4.0
5:  Michael Palin: Pole to Pole (1992) [Documentary] - 0.0
6:  Kung Fu: Season 3 (1974) [Television] - 0.0
7:  Danger Mouse (1981) [Children & Family] - 3.0
8:  Farscape (1999) [Television] - 3.0
9:  Helvetica (2007) [Documentary] - 3.0
10:  Hogan's Heroes (1965) [Television] - 3.0
11:  The Lion in Winter (2003) [Drama] - 3.0
12:  Monty Python: John Cleese's Best (2005) [Television] - 3.0
13:  Sarah Silverman: Jesus Is Magic (2005) [Comedy] - 3.0
14:  Stephen King's It (1990) [Horror] - 3.0
15:  Superman II (1980) [Action & Adventure] - 3.0
16:  Superman: The Movie (1978) [Classics] - 3.0
17:  Tom Brown's Schooldays (1951) [Drama] - 3.0
18:  An Evening with Kevin Smith 2 (2006) [Comedy] - 0.0
19:  Crimewave (1986) [Comedy] - 2.0
20:  Huff (2004) [Television] - 2.0
21:  Aqua Teen Hunger Force (2000) [Television] - 1.0
22:  The Boondocks (2005) [Television] - 1.0

Scraping MoviesYouveSeen_002.htm
23:  Ricky Gervais: Out of England (2008) [Comedy] - 5.0
24:  Robot Chicken (2005) [Television] - 5.0
25:  Robot Chicken Star Wars (2007) [Comedy] - 5.0
26:  Rome (2005) [Television] - 5.0
27:  Scrubs (2001) [Television] - 5.0
28:  Stewie Griffin: The Untold Story (2005) [Television] - 5.0
29:  Spaced: The Complete Series (1999) [Television] - 0.0
30:  Alice (2009) [Sci-Fi & Fantasy] - 0.0
31:  Best of the Chris Rock Show: Vol. 1 (1999) [Television] - 4.0
32:  The Critic: The Complete Series (1994) [Television] - 4.0
33:  Dilbert (1999) [Television] - 4.0
34:  An Evening with Kevin Smith (2002) [Comedy] - 4.0
35:  John Adams (2008) [Drama] - 4.0
36:  King of the Hill (1997) [Television] - 4.0
37:  The Lone Gunmen: The Complete Series (2001) [Television] - 4.0
38:  Neverwhere (1996) [Sci-Fi & Fantasy] - 4.0
39:  Robin Hood (2006) [Television] - 4.0
40:  The Sand Pebbles (1966) [Classics] - 4.0
41:  The Sarah Silverman Program (2007) [Television] - 4.0
42:  The Silence of the Lambs (1991) [Thrillers] - 4.0

Scraping MoviesYouveSeen_003.htm
43:  Alias (2001) [Television] - 5.0
44:  Alien (1979) [Sci-Fi & Fantasy] - 5.0
45:  Band of Brothers (2001) [Drama] - 5.0
46:  Bleak House (2005) [Drama] - 5.0
47:  Brisco County, Jr.: Complete Series (1993) [Television] - 5.0
48:  Code Monkeys (2007) [Television] - 5.0
49:  Coupling (2000) [Television] - 5.0
50:  Dead Like Me (2003) [Television] - 5.0
51:  Deadwood (2004) [Television] - 5.0
52:  Family Guy (1999) [Television] - 5.0
53:  Family Guy: Blue Harvest (2007) [Television] - 5.0
54:  Firefly (2002) [Television] - 5.0
55:  Futurama (1999) [Television] - 5.0
56:  Futurama the Movie: Bender's Big Score (2007) [Television] - 5.0
57:  The Great Escape (1963) [Classics] - 5.0
58:  Greg the Bunny (2002) [Television] - 5.0
59:  How I Met Your Mother (2005) [Television] - 5.0
60:  MI-5 (2002) [Television] - 5.0
61:  My Name Is Earl (2005) [Television] - 5.0
62:  Police Squad!: The Complete Series (1982) [Television] - 5.0

Scraping MoviesYouveSeen_004.htm


Thanks a ton to the original author, and the full version is attached here for posterity.