{ "metadata": { "name": "", "signature": "sha256:b9c7ba4db606d9663a26b467ed2197a24f638117b05676f876884e65ea801318" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Strings in Python" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "What is a string?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A \"string\" is a series of characters of arbitrary length.\n", "Strings are immutable - they cannot be changed once created. When you modify a string, you automatically make a copy and modify the copy." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s1 = 'Godzilla'\n", "print s1, s1.upper()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Godzilla GODZILLA\n" ] } ], "prompt_number": 2 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "String literals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A \"literal\" is essentially a string constant, already spelled out for you. Python uses either on output, but that's just for formatting simplicity." ] }, { "cell_type": "code", "collapsed": false, "input": [ "\"Godzilla\"" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "'Godzilla'" ] } ], "prompt_number": 3 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Single and double quotes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generally, a string literal can be in single ('), double (\"), or triple (''') quotes. Single and double quotes are equivalent - use whichever you prefer (but be consistent). If you need to have a single or double quote in your literal, surround your literal with the other type, or use the backslash to escape the quote." ] }, { "cell_type": "code", "collapsed": false, "input": [ "\"Godzilla's a kaiju.\"" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "\"Godzilla's a kaiju.\"" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "'Godzilla\\'s a kaiju.'" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "\"Godzilla's a kaiju.\"" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "'We call him... \"Godzilla\".'" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "'We call him... \"Godzilla\".'" ] } ], "prompt_number": 6 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Triple quotes (''')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Triple quotes are a special form of quoting used for documenting your Python files (docstrings). We won't discuss that type here." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Raw strings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Raw strings don't use any escape character interpretation. Use them when you have a complicated string that you don't want to clutter with lots of backslashes. Python puts them in for you." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print('This is a\\ncomplicated string with newline escapes in it.')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "This is a\n", "complicated string with newline escapes in it.\n" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "print(r'This is a\\ncomplicated string with newline escapes in it.')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "This is a\\ncomplicated string with newline escapes in it.\n" ] } ], "prompt_number": 8 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Strings and numbers" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "String objects" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "String objects are just the string variables you create in Python." ] }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju = 'Godzilla'\n", "print(kaiju)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Godzilla\n" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 20, "text": [ "'Godzilla'" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note the print() call shows no quotes, while the simple variable name did. That is a Python output convention. Just entering the name will call the repr() method, which displays the value of the argument as Python would see it when it reads it in, not as the user wants it." ] }, { "cell_type": "code", "collapsed": false, "input": [ "repr(kaiju)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 23, "text": [ "\"'Godzilla'\"" ] } ], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "print(repr(kaiju))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "'Godzilla'\n" ] } ], "prompt_number": 22 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "String operators" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you read text from a file, it's just that - text. No matter what the data represents, it's still text. To use it as a number, you have to explicitly convert it to a number." ] }, { "cell_type": "code", "collapsed": false, "input": [ "one = 1\n", "two = '2'\n", "print one, two, one + two" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 2" ] }, { "ename": "TypeError", "evalue": "unsupported operand type(s) for +: 'int' and 'str'", "output_type": "pyerr", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0mone\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[0mtwo\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m'2'\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[1;32mprint\u001b[0m \u001b[0mone\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mtwo\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mone\u001b[0m \u001b[1;33m+\u001b[0m \u001b[0mtwo\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mTypeError\u001b[0m: unsupported operand type(s) for +: 'int' and 'str'" ] } ], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "one = 1\n", "two = int('2')\n", "print one, two, one + two" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 1 2 3\n" ] } ], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "num1 = 1.1\n", "num2 = float('2.2')\n", "print num1, num2, num1 + num2" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1.1 2.2 3.3\n" ] } ], "prompt_number": 23 }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also do this with hexadecimal and octal numbers, or any other base, for that matter." ] }, { "cell_type": "code", "collapsed": true, "input": [ "print int('FF', 16)\n", "print int('0xff', 16)\n", "print int('777', 8)\n", "print int('0777', 8)\n", "print int('222', 7)\n", "print int('110111001', 2)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "255\n", "255\n", "511\n", "511\n", "114\n", "441\n" ] } ], "prompt_number": 33 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the conversion cannot be done, an exception is thrown." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print int('0xGG', 16)" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "invalid literal for int() with base 16: '0xGG'", "output_type": "pyerr", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;32mprint\u001b[0m \u001b[0mint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'0xGG'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m16\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mValueError\u001b[0m: invalid literal for int() with base 16: '0xGG'" ] } ], "prompt_number": 34 }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Concatenation" ] }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju1 = 'Godzilla'\n", "kaiju2 = 'Mothra'\n", "kaiju1 + ' versus ' + kaiju2" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 24, "text": [ "'Godzilla versus Mothra'" ] } ], "prompt_number": 24 }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Repetition" ] }, { "cell_type": "code", "collapsed": false, "input": [ "'Run away! ' * 3" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 27, "text": [ "'Run away! Run away! Run away! '" ] } ], "prompt_number": 27 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "String keywords" ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "in()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NOTE: This _particular_ statement is false regardless of how the statement is evaluated! :^)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "'Godzilla' in 'Godzilla vs Gamera'" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 109, "text": [ "True" ] } ], "prompt_number": 109 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "String functions" ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "len()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "len(kaiju)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 28, "text": [ "8" ] } ], "prompt_number": 28 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "String methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember - methods are functions attached to objects, accessed via the 'dot' notation." ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Basic formatting and manipulation" ] }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "capitalize()/lower()/upper()/swapcase()/title()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju.capitalize()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 29, "text": [ "'Godzilla'" ] } ], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju.lower()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 30, "text": [ "'godzilla'" ] } ], "prompt_number": 30 }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju.upper()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 31, "text": [ "'GODZILLA'" ] } ], "prompt_number": 31 }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju.swapcase()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 32, "text": [ "'gODZILLA'" ] } ], "prompt_number": 32 }, { "cell_type": "code", "collapsed": false, "input": [ "'godzilla, king of the monsters'.title()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 34, "text": [ "'Godzilla, King Of The Monsters'" ] } ], "prompt_number": 34 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "center()/ljust()/rjust()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju.center(20, '*')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 35, "text": [ "'******Godzilla******'" ] } ], "prompt_number": 35 }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju.ljust(20, '*')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 36, "text": [ "'Godzilla************'" ] } ], "prompt_number": 36 }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju.rjust(20, '*')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 37, "text": [ "'************Godzilla'" ] } ], "prompt_number": 37 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "expandtabs()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "tabbed_kaiju = '\\tGodzilla'\n", "print('[' + tabbed_kaiju + ']')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[\tGodzilla]\n" ] } ], "prompt_number": 40 }, { "cell_type": "code", "collapsed": false, "input": [ "print('[' + tabbed_kaiju.expandtabs(16) + ']')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[ Godzilla]\n" ] } ], "prompt_number": 42 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "join()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "' vs '.join(['Godzilla', 'Hedorah'])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 44, "text": [ "'Godzilla vs Hedorah'" ] } ], "prompt_number": 44 }, { "cell_type": "code", "collapsed": false, "input": [ "','.join(['Godzilla', 'Mothra', 'King Ghidorah'])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 45, "text": [ "'Godzilla,Mothra,King Ghidorah'" ] } ], "prompt_number": 45 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "strip()/lstrip()/rstrip()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "' Godzilla '.strip()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 47, "text": [ "'Godzilla'" ] } ], "prompt_number": 47 }, { "cell_type": "code", "collapsed": false, "input": [ "'xxxGodzillayyy'.strip('xy')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 49, "text": [ "'Godzilla'" ] } ], "prompt_number": 49 }, { "cell_type": "code", "collapsed": false, "input": [ "' Godzilla '.lstrip()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 50, "text": [ "'Godzilla '" ] } ], "prompt_number": 50 }, { "cell_type": "code", "collapsed": false, "input": [ "' Godzilla '.rstrip()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 51, "text": [ "' Godzilla'" ] } ], "prompt_number": 51 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "partition()/rpartition()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "battle = 'Godzilla x Gigan'\n", "battle.partition(' x ')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 52, "text": [ "('Godzilla', ' x ', 'Gigan')" ] } ], "prompt_number": 52 }, { "cell_type": "code", "collapsed": false, "input": [ "battle = 'Godzilla and Jet Jaguar vs. Gigan and Megalon'\n", "battle.partition(' vs. ')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 53, "text": [ "('Godzilla and Jet Jaguar', ' vs. ', 'Gigan and Megalon')" ] } ], "prompt_number": 53 }, { "cell_type": "code", "collapsed": false, "input": [ "battle = 'Godzilla vs Megalon vs Jet Jaguar'\n", "battle.partition('vs')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 54, "text": [ "('Godzilla ', 'vs', ' Megalon vs Jet Jaguar')" ] } ], "prompt_number": 54 }, { "cell_type": "code", "collapsed": false, "input": [ "battle = 'Godzilla vs Megalon vs Jet Jaguar'\n", "battle.rpartition('vs')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 55, "text": [ "('Godzilla vs Megalon ', 'vs', ' Jet Jaguar')" ] } ], "prompt_number": 55 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "replace()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "battle = 'Godzilla vs Mothra'\n", "battle.replace('Mothra', 'Anguiras')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 56, "text": [ "'Godzilla vs Anguiras'" ] } ], "prompt_number": 56 }, { "cell_type": "code", "collapsed": false, "input": [ "battle = 'Godzilla vs a monster and another monster'\n", "battle.replace('monster', 'kaiju', 2)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 57, "text": [ "'Godzilla vs a kaiju and another kaiju'" ] } ], "prompt_number": 57 }, { "cell_type": "code", "collapsed": false, "input": [ "battle = 'Godzilla vs a monster and another monster and yet another monster'\n", "battle.replace('monster', 'kaiju', 2)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 58, "text": [ "'Godzilla vs a kaiju and another kaiju and yet another monster'" ] } ], "prompt_number": 58 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "split()/rsplit()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "battle = 'Godzilla vs King Ghidorah vs Mothra'\n", "battle.split(' vs ')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 59, "text": [ "['Godzilla', 'King Ghidorah', 'Mothra']" ] } ], "prompt_number": 59 }, { "cell_type": "code", "collapsed": false, "input": [ "kaijus = 'Godzilla,Mothra,King Ghidorah'\n", "kaijus.split(',')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 60, "text": [ "['Godzilla', 'Mothra', 'King Ghidorah']" ] } ], "prompt_number": 60 }, { "cell_type": "code", "collapsed": false, "input": [ "kaijus = 'Godzilla Mothra King Ghidorah'\n", "kaijus.split()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 61, "text": [ "['Godzilla', 'Mothra', 'King', 'Ghidorah']" ] } ], "prompt_number": 61 }, { "cell_type": "code", "collapsed": false, "input": [ "kaijus = 'Godzilla,Mothra,King Ghidorah,Megalon'\n", "kaijus.rsplit(',', 2)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 63, "text": [ "['Godzilla,Mothra', 'King Ghidorah', 'Megalon']" ] } ], "prompt_number": 63 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "splitlines()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "kaijus_in_lines = 'Godzilla\\nMothra\\nKing Ghidorah\\nEbirah'\n", "print(kaijus_in_lines)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Godzilla\n", "Mothra\n", "King Ghidorah\n", "Ebirah\n" ] } ], "prompt_number": 66 }, { "cell_type": "code", "collapsed": false, "input": [ "kaijus_in_lines.splitlines()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 67, "text": [ "['Godzilla', 'Mothra', 'King Ghidorah', 'Ebirah']" ] } ], "prompt_number": 67 }, { "cell_type": "code", "collapsed": false, "input": [ "kaijus_in_lines.splitlines(True)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 68, "text": [ "['Godzilla\\n', 'Mothra\\n', 'King Ghidorah\\n', 'Ebirah']" ] } ], "prompt_number": 68 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "zfill()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "age_of_Godzilla = 60\n", "age_string = str(age_of_Godzilla)\n", "print(age_string, age_string.zfill(5))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "('60', '00060')\n" ] } ], "prompt_number": 71 }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "String information" ] }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "isXXX()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print('Godzilla'.isalnum())\n", "print('*Godzilla*'.isalnum())\n", "print('Godzilla123'.isalnum())" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "True\n", "False\n", "True\n" ] } ], "prompt_number": 85 }, { "cell_type": "code", "collapsed": false, "input": [ "print('Godzilla'.isalpha())\n", "print('Godzilla123'.isalpha())" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "True\n", "False\n" ] } ], "prompt_number": 86 }, { "cell_type": "code", "collapsed": false, "input": [ "print('Godzilla'.isdigit())\n", "print('60'.isdigit())" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "False\n", "True\n" ] } ], "prompt_number": 87 }, { "cell_type": "code", "collapsed": false, "input": [ "print('SpaceGodzilla'.isspace())\n", "print(' '.isspace())" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "False\n", "True\n" ] } ], "prompt_number": 89 }, { "cell_type": "code", "collapsed": false, "input": [ "print('Godzilla'.islower())\n", "print('godzilla'.islower())" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "False\n", "True\n" ] } ], "prompt_number": 88 }, { "cell_type": "code", "collapsed": false, "input": [ "print('Godzilla'.isupper())\n", "print('GODZILLA'.isupper())" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "False\n", "True\n" ] } ], "prompt_number": 93 }, { "cell_type": "code", "collapsed": false, "input": [ "print('Godzilla vs Mothra'.istitle())\n", "print('Godzilla X Mothra'.istitle())" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "False\n", "True\n" ] } ], "prompt_number": 91 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "count()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "monsters = 'Godzilla and Space Godzilla and MechaGodzilla'\n", "print 'There are ', monsters.count('Godzilla'), ' Godzillas.'\n", "print 'There are ', monsters.count('Godzilla', len('Godzilla')), ' pseudo-Godzillas.'" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "There are 3 Godzillas.\n", "There are 2 pseudo-Godzillas.\n" ] } ], "prompt_number": 103 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "startswith()/endswith()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "king_kaiju = 'Godzilla'\n", "print king_kaiju.startswith('God')\n", "print king_kaiju.endswith('lla')\n", "print king_kaiju.startswith('G')\n", "print king_kaiju.endswith('amera')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "True\n", "True\n", "True\n", "False\n" ] } ], "prompt_number": 105 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "find()/index()/rfind()/rindex()" ] }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju_string = 'Godzilla,Gamera,Gorgo,Space Godzilla'\n", "print 'The first Godz is at position', kaiju_string.find('Godz')\n", "print 'The second Godz is at position', kaiju_string.find('Godz', len('Godz'))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "The first Godz is at position 0\n", "The second Godz is at position 28\n" ] } ], "prompt_number": 41 }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju_string.index('Minilla')" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "substring not found", "output_type": "pyerr", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mkaiju_string\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Minilla'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mValueError\u001b[0m: substring not found" ] } ], "prompt_number": 42 }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju_string.rindex('Godzilla')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 44, "text": [ "28" ] } ], "prompt_number": 44 }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Advanced features" ] }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "decode()/encode()/translate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Used to convert strings to/from Unicode and other systems. Rarely used in science code." ] }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "String formatting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similar to formatting in C, FORTRAN, etc.. There is a _lot_ more to this than I am showing here." ] }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju = 'Godzilla'\n", "age = 60\n", "print '%s is %d years old.' % (kaiju, age)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Godzilla is 60 years old.\n" ] } ], "prompt_number": 111 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "The _string_ module" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The _string_ module is the Python equivalent of \"junk DNA\" in living organisms. It's been around since the beginning, but many of its functions have been superseded by evolution. But some ancient code still relies on it, so they leave the old parts in....\n", "\n", "For modern code, the _string_ module does have some useful constants and functions." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import string" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "print string.ascii_letters\n", "print string.ascii_lowercase\n", "print string.ascii_uppercase" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\n", "abcdefghijklmnopqrstuvwxyz\n", "ABCDEFGHIJKLMNOPQRSTUVWXYZ\n" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "print string.digits\n", "print string.hexdigits\n", "print string.octdigits" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0123456789\n", "0123456789abcdefABCDEF\n", "01234567\n" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "print string.letters\n", "print string.lowercase\n", "print string.uppercase" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\n", "abcdefghijklmnopqrstuvwxyz\n", "ABCDEFGHIJKLMNOPQRSTUVWXYZ\n" ] } ], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "print string.printable\n", "print string.punctuation\n", "print string.whitespace" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n", "\r", "\u000b", "\f", "\n", "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~\n", "\t\n", "\u000b", "\f", "\r", " \n" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The _string_ module also provides the _Formatter_ class, which can be useful for sophisticated text formatting." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Regular Expressions" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "What is a regular expression?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Regular expressions ('regexps') are essentially a mini-language for describing string operations. Everything shown above with string methods and operators can be done with regular expressions. Most of the time, the regular expression verrsion is more concise. But not always more readable....\n", "\n", "To use regular expressions, you have to import the 're' module." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import re" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "A very short, whirlwind tour of regular expressions" ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Scanning" ] }, { "cell_type": "code", "collapsed": false, "input": [ "kaiju_truth = 'Godzilla is the King of the Monsters. Ebirah is also a monster, but looks like a giant lobster.'\n", "re.findall('Godz', kaiju_truth)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "['Godz']" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "print re.findall('(^.+) is the King', kaiju_truth)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['Godzilla']\n" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For simple searches like this, using in() is typically easier.\n", "Regexps are by default case-sensitive." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print re.findall('\\. (.+) is also', kaiju_truth)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['Ebirah']\n" ] } ], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "print re.findall('(.+) is also a (.+)', kaiju_truth)[0]\n", "print re.findall('\\. (.+) is also a (.+),', kaiju_truth)[0]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "('Godzilla is the King of the Monsters. Ebirah', 'monster, but looks like a giant lobster.')\n", "('Ebirah', 'monster')\n" ] } ], "prompt_number": 39 }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Changing" ] }, { "cell_type": "code", "collapsed": false, "input": [ "some_kaiju = 'Godzilla, Space Godzilla, Mechagodzilla'\n", "print re.sub('Godzilla', 'Gamera', some_kaiju)\n", "print re.sub('(?i)Godzilla', 'Gamera', some_kaiju)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Gamera, Space Gamera, Mechagodzilla\n", "Gamera, Space Gamera, MechaGamera\n" ] } ], "prompt_number": 10 }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "And so much more..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You could spend a whole day (or more) just learning about regular expressions. But they are incredibly useful and powerful, especially in the all-to-frequent drudgery of munging files from one format to another.\n", "\n", "Regular expressions can be internally compiled for speed." ] } ], "metadata": {} } ] }