Posts in category python

Goodbye Netflix

Wow. I just checked, and I've had Netflix since 08/10/2001. Over thirteen years. Longer than my marriage. Two houses ago. I'm down to the cheapest one-at-a-time plan, and I still get around to it every three or four months.

I think it's time to say goodbye.

But here's how they get you to stay:

Based on your 1698 ratings, this is the list of movies and TV shows you've seen. 

Yeah... thirteen and a half years of data that I don't want to lose! And that's my main account - I have two other profiles too. I searched the 'net for a solution, and came up with a lot. None worked. GreaseMonkey ones. PHP ones. None worked.

This was the closest: https://gist.github.com/tloredo/8483682

But I don't have a Mac, so I needed to manually capture that info. Ninety pages of ratings. So I used DownThemAll!. I opened the download manager manually, and for the URL I used http://dvd.netflix.com/MoviesYouveSeen?pageNum=[1:90] - I had manually determined 90 with some trial and error. This saved all the pages to files named MoviesYouveSeen.htm and then MoviesYouveSeen_NNN.htm.

I modified the script to read these HTML files instead of launching Safari. After that, the ratings were off - every movie in the file would have the rating of the first in the file. So I tweaked that. For some reason, some don't show a rating in the HTML, even when these were supposedly rated. Some are "No Interest," but others, I just don't know what happened. So I have it output 0.0 if it couldn't figure it out - a 99% solution.

Here are my changes from the gitlab (17 Jan 2014) version (depending on screen width, you might have to scroll way down):

  • .py

    old new  
     1#!/bin/env python
     2# Original @ https://gist.github.com/tloredo/8483682
    13"""
    24Scrape a user's Netflix movie ratings by automating a Safari browsing
    35session (with the user already logged in).  The ratings are written
     
    106108
    107109from jinja2 import Template
    108110from lxml import html
     111import re
     112
     113fname_regex = re.compile(r'(\w+?)_?(\d+)?\.(\w+)')
     114rating_regex = re.compile(r'You rated this movie: (\d)\.(\d)')
    109115
    110116
    111117# AppleScript functions asrun and asquote (presently unused) are from:
     
    159165    All values are strings.
    160166    """
    161167    # Load the page, grab the HTML, and parse it to a tree.
    162     script = ASTemplate.render(URL=url, DTIME=dtime)
    163     reply = asrun(script)
     168    reply = ''
     169    try:
     170      with open(url) as infile:
     171        for str_ in infile:
     172          reply += str_
     173    except IOError:
     174      return [], None
     175
     176
    164177    tree = html.fromstring(reply)
    165178    rows = tree.xpath('//table[@class="listHeader"]//tr')
    166179
     
    180193            # changing from page to page.  For info on XPath for such cases, see:
    181194            # http://stackoverflow.com/questions/8808921/selecting-a-css-class-with-xpath
    182195            # rating = data[3].xpath('//span[@class="stbrMaskFg sbmfrt sbmf-50"]')[0].text_content()
    183             rating = data[3].xpath('//span[contains(concat(" ", normalize-space(@class), " "), " stbrMaskFg ")]')[0].text_content()
    184             rating = rating.split(':')[1].strip()  # keep only the number
     196            rating_cut = rating_regex.match(data[3].text_content())
     197            rating = '0.0'
     198            if rating_cut:
     199               rating = "%s.%s"%(rating_cut.group(1), rating_cut.group(2))
     200
    185201            info.append((title, year, genre, rating))
    186202
    187203    # Next URL to load:
    188     next_elem = tree.xpath('//li[@class="navItem paginationLink paginationLink-next"]/a')
    189     if next_elem:
    190         next_url = next_elem[0].get('href')
    191     else:  # empty list
    192         next_url = None
     204    fname_cut = fname_regex.match(url)
     205    if fname_cut:
     206      if None == fname_cut.group(2):
     207        num = 0
     208      else:
     209        num = fname_cut.group(2)
     210      next_url = "%s_%03.f.%s"%(fname_cut.group(1),int(num)+1,fname_cut.group(3))
     211    else:
     212      print "Regex failed."
     213      next_url = None
     214
    193215
    194216    return info, next_url
    195217
    196218
    197219# Use this initial URL for DVD accounts:
    198 url = 'http://dvd.netflix.com/MoviesYouveSeen'
     220url = 'MoviesYouveSeen.htm'
    199221# Use this initial URL for streaming accounts:
    200222# url = 'http://movies.netflix.com/MoviesYouveSeen'
    201223

This renders a lot of the script useless, but there's no benefit in making the diff larger so I didn't trim anything else.

Here's when I ran it across my "TV Queue" account - yeah they're not all TV, sometimes I accidentally rated things with the wrong profile:

$ ./ScrapeNetflixRatings.py
Scraping MoviesYouveSeen.htm
1:  Garmin Streetpilot 2610/2650 GPS (2003) [Special Interest] - 1.0
2:  Six Feet Under (2001) [Television] - 0.0

Scraping MoviesYouveSeen_001.htm
3:  The Thief of Bagdad (1924) [Classics] - 4.0
4:  The Tick (2001) [Television] - 4.0
5:  Michael Palin: Pole to Pole (1992) [Documentary] - 0.0
6:  Kung Fu: Season 3 (1974) [Television] - 0.0
7:  Danger Mouse (1981) [Children & Family] - 3.0
8:  Farscape (1999) [Television] - 3.0
9:  Helvetica (2007) [Documentary] - 3.0
10:  Hogan's Heroes (1965) [Television] - 3.0
11:  The Lion in Winter (2003) [Drama] - 3.0
12:  Monty Python: John Cleese's Best (2005) [Television] - 3.0
13:  Sarah Silverman: Jesus Is Magic (2005) [Comedy] - 3.0
14:  Stephen King's It (1990) [Horror] - 3.0
15:  Superman II (1980) [Action & Adventure] - 3.0
16:  Superman: The Movie (1978) [Classics] - 3.0
17:  Tom Brown's Schooldays (1951) [Drama] - 3.0
18:  An Evening with Kevin Smith 2 (2006) [Comedy] - 0.0
19:  Crimewave (1986) [Comedy] - 2.0
20:  Huff (2004) [Television] - 2.0
21:  Aqua Teen Hunger Force (2000) [Television] - 1.0
22:  The Boondocks (2005) [Television] - 1.0

Scraping MoviesYouveSeen_002.htm
23:  Ricky Gervais: Out of England (2008) [Comedy] - 5.0
24:  Robot Chicken (2005) [Television] - 5.0
25:  Robot Chicken Star Wars (2007) [Comedy] - 5.0
26:  Rome (2005) [Television] - 5.0
27:  Scrubs (2001) [Television] - 5.0
28:  Stewie Griffin: The Untold Story (2005) [Television] - 5.0
29:  Spaced: The Complete Series (1999) [Television] - 0.0
30:  Alice (2009) [Sci-Fi & Fantasy] - 0.0
31:  Best of the Chris Rock Show: Vol. 1 (1999) [Television] - 4.0
32:  The Critic: The Complete Series (1994) [Television] - 4.0
33:  Dilbert (1999) [Television] - 4.0
34:  An Evening with Kevin Smith (2002) [Comedy] - 4.0
35:  John Adams (2008) [Drama] - 4.0
36:  King of the Hill (1997) [Television] - 4.0
37:  The Lone Gunmen: The Complete Series (2001) [Television] - 4.0
38:  Neverwhere (1996) [Sci-Fi & Fantasy] - 4.0
39:  Robin Hood (2006) [Television] - 4.0
40:  The Sand Pebbles (1966) [Classics] - 4.0
41:  The Sarah Silverman Program (2007) [Television] - 4.0
42:  The Silence of the Lambs (1991) [Thrillers] - 4.0

Scraping MoviesYouveSeen_003.htm
43:  Alias (2001) [Television] - 5.0
44:  Alien (1979) [Sci-Fi & Fantasy] - 5.0
45:  Band of Brothers (2001) [Drama] - 5.0
46:  Bleak House (2005) [Drama] - 5.0
47:  Brisco County, Jr.: Complete Series (1993) [Television] - 5.0
48:  Code Monkeys (2007) [Television] - 5.0
49:  Coupling (2000) [Television] - 5.0
50:  Dead Like Me (2003) [Television] - 5.0
51:  Deadwood (2004) [Television] - 5.0
52:  Family Guy (1999) [Television] - 5.0
53:  Family Guy: Blue Harvest (2007) [Television] - 5.0
54:  Firefly (2002) [Television] - 5.0
55:  Futurama (1999) [Television] - 5.0
56:  Futurama the Movie: Bender's Big Score (2007) [Television] - 5.0
57:  The Great Escape (1963) [Classics] - 5.0
58:  Greg the Bunny (2002) [Television] - 5.0
59:  How I Met Your Mother (2005) [Television] - 5.0
60:  MI-5 (2002) [Television] - 5.0
61:  My Name Is Earl (2005) [Television] - 5.0
62:  Police Squad!: The Complete Series (1982) [Television] - 5.0

Scraping MoviesYouveSeen_004.htm


Thanks a ton to the original author, and the full version is attached here for posterity.

IP Address in Python (Windows)

From StackOverflow, my changes:

  • Py3 compat (no big deal)
  • Added DHCP support
  • Use CurrentControlSet (saner IMHO)
import os
import sys
import winreg as _winreg # Hack for py3 compared to original SO post


def main():
    adapter_list_key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE,
        r'SOFTWARE\Microsoft\Windows NT\CurrentVersion\NetworkCards')

    adapter_count = _winreg.QueryInfoKey(adapter_list_key)[0]

    for i in range(adapter_count):
        sub_key_name = _winreg.EnumKey(adapter_list_key, i)
        adapter_key = _winreg.OpenKey(adapter_list_key, sub_key_name)
        (adapter_service_name, _) = _winreg.QueryValueEx(adapter_key, "ServiceName")
        (description, _) = _winreg.QueryValueEx(adapter_key, "Description")

        adapter_registry_path = os.path.join(r'SYSTEM\CurrentControlSet\Services',
            adapter_service_name, "Parameters", "Tcpip")
        adapter_service_key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE,
            adapter_registry_path)
        (subnet_mask, _) = _winreg.QueryValueEx(adapter_service_key, "SubnetMask")
        if ['0.0.0.0'] == subnet_mask:
            (subnet_mask, type_) = _winreg.QueryValueEx(adapter_service_key, "DhcpSubnetMask")
            if _winreg.REG_SZ == type_:
                subnet_mask = [subnet_mask] # Make 1-element list to match non-DHCP
        (ip_address, _) = _winreg.QueryValueEx(adapter_service_key, "IpAddress")
        if ['0.0.0.0'] == ip_address:
            (ip_address, type_) = _winreg.QueryValueEx(adapter_service_key, "DhcpIPAddress")
            if _winreg.REG_SZ == type_:
                ip_address = [ip_address] # Make 1-element list to match non-DHCP
        sys.stdout.write("Name: %s\n" % adapter_service_name)
        sys.stdout.write("Description: %s\n" % description)
        sys.stdout.write("SubnetMask: %s\n" % subnet_mask)
        sys.stdout.write("IpAdress: %s\n" % ip_address)


if __name__ == "__main__":
    main()

Python deepcopy broken

Well, that was annoying... spent a long time last Friday and today to find out that Python 2.7's copy.deepcopy doesn't play well with xml.dom.minidom. See this bug report.

The workaround is to use "doc.cloneNode(True)" instead.

Email your new IP address with TomatoUSB

So my router is now TomatoUSB and I wanted an alert when the IP changed. Sure, I could probably put something local on the router, but where's the fun in that?

So I put together a quick python script to drop me an email if the IP ever changes. Yes, TomatoUSB supports various Dynamic DNS services, but doesn't seem to natively support "email me."

So on the DDNS setup page, I chose the "Custom URL" service, and I put in "http:[email protected]" as the URL (the internal address of an Apache server running WSGI.

I have a custom config file /etc/httpd/conf.d/wsgi_IP as follows:

WSGIScriptAlias /IPCHECKS /var/www/wsgi/IP.wsgi

<Directory "/var/www/wsgi/">
  WSGIApplicationGroup %{GLOBAL}
  Order deny,allow
  Deny from all
  Allow from 192 127 ::1
</Directory>

HOPEFULLY that means none of you can change what I think my IP address is. ;)

Here's the actual python script (/var/www/wsgi/IP.wsgi):

from __future__ import print_function
from cgi import parse_qs, escape
import socket
import smtplib

# This is RevRagnarok's ugly IP checker.
# Tomato (firmware) will post to us with a "new_ip" parameter
# At this point, I want to see manually that the IPs change, not have it autoupdate
# Note: I had to enable HTTP sending email in SELinux:
# setsebool -P httpd_can_sendmail 1

def application(environ, start_response):
    parameters = parse_qs(environ.get('QUERY_STRING', ''))
    if 'new_ip' in parameters:
        newip = escape(parameters['new_ip'][0])
    else:
        newip = 'Unknown!'
    start_response('200 OK', [('Content-Type', 'text/html')])
    # Look up DNS values
    oldip = socket.gethostbyname('revragnarok.com') # Yes, IPv4 only
    # Compare
    changed = ''
    if newip != oldip:
        changed = 'IP changed from {0} to {1}.'.format(oldip, newip)
    if changed:
        e_from = [email protected]'
        e_to = [[email protected]']
        e_msg = """Subject: IP Address change detected

{0}""".format(changed)
        # I considered a try/catch block here, but then what would I do?
        smtpObj = smtplib.SMTP('localhost')
        smtpObj.sendmail(e_from, e_to, e_msg)
    else:
       changed = '(unchanged)'
       changed = 'IP is {0} (unchanged).'.format(newip)
    return [changed]

And don't forget, if you use SELinux, fix permissions on the script, and allow the webserver to send email:

[[email protected] wsgi]# ls -Z IP.wsgi 
-rw-r--r--. root root system_u:object_r:httpd_sys_script_exec_t:s0 IP.wsgi
[[email protected] wsgi]# setsebool -P httpd_can_sendmail 1