4 Writing Structured Programs By now you will have a sense of the capabilities of the Python programming language for processing natural language. However, if you’re new to Python or to programming, you may still be wrestling with Python and the technique of speed reading and memory development for children feel like you are in full control yet. How can you write well-structured, readable programs that you and others will be able to re-use easily? How do the fundamental building blocks work, such as loops, functions and assignment?
What are some of the pitfalls with Python programming and how can you avoid them? Along the way, you will consolidate your knowledge of fundamental programming constructs, learn more about using features of the Python language in a natural and concise way, and learn some useful techniques in visualizing natural language data. In the other chapters of this book, we have organized the programming concepts as dictated by the needs of NLP. Here we revert to a more conventional approach where the material is more closely tied to the structure of the programming language.
There’s not room for a complete presentation of the language, so we’ll just focus on the language constructs and idioms that are most important for NLP. However, there are some surprising subtleties here. However, assignment statements do not always involve making copies in this way. Assignment always copies the value of an expression, but a value is not always what you might expect it to be. To understand what is going on here, we need to know how lists are stored in the computer’s memory.
Observe that changing one of the items inside our nested list of lists changed them all. This is because each of the three elements is actually just a reference to one and the same list in memory. Now modify one of the elements of the list, and observe that all the elements are changed. We began with a list containing three references to a single empty list object. This last step modified one of the three object references inside the nested list. It is crucial to appreciate this difference between modifying an object via an object reference, and overwriting an object reference.
This copies the object references inside the list. Equality Python provides two ways to check that a pair of items are the same. We can use it to verify our earlier observations about objects. Now let’s put a new python in this nest. This reveals that the second item of the list has a distinct identifier. If you try running this code snippet yourself, expect to see different numbers in the resulting list, and also the interloper may be in a different position. Having two kinds of equality might seem strange.
However, it’s really just the type-token distinction, familiar from natural language, here showing up in a programming language. 2 Sequences So far, we have seen two kinds of sequence object: strings and lists. We’ve actually seen them in the previous chapters, and sometimes referred to them as “pairs”, since there were always two members. However, tuples can have any number of members. Tuples are constructed using the comma operator. Parentheses are a more general feature of Python syntax, designed for grouping. Notice in this code sample that we computed multiple values on a single line, separated by commas.
These comma-separated expressions are actually just tuples — Python allows us to omit the parentheses around tuples if there is no ambiguity. When we print a tuple, the parentheses are always displayed. By using tuples in this way, we are implicitly aggregating items together. The sequence functions illustrated in 4. We can convert between these sequence types. Red lorry, yellow lorry, red lorry, yellow lorry.