foreword |
|
xi | |
preface |
|
xiii | |
about the cover illustration |
|
xviii | |
PART I FOUNDATIONS |
|
1 | (78) |
|
Data, data munging, and Perl |
|
|
3 | (15) |
|
|
4 | (3) |
|
|
4 | (1) |
|
|
5 | (1) |
|
|
6 | (1) |
|
|
6 | (1) |
|
|
6 | (1) |
|
Why is data munging important? |
|
|
7 | (2) |
|
Accessing corporate data repositories |
|
|
7 | (1) |
|
Transferring data between multiple systems |
|
|
7 | (1) |
|
Real-world data munging examples |
|
|
8 | (1) |
|
Where does data come from? Where does it go? |
|
|
9 | (3) |
|
|
9 | (1) |
|
|
10 | (1) |
|
|
11 | (1) |
|
|
11 | (1) |
|
What forms does data take? |
|
|
12 | (2) |
|
|
12 | (1) |
|
|
13 | (1) |
|
|
13 | (1) |
|
|
13 | (1) |
|
|
14 | (2) |
|
|
15 | (1) |
|
Why is Perl good for data munging? |
|
|
16 | (1) |
|
|
17 | (1) |
|
|
17 | (1) |
|
General munging-practices |
|
|
18 | (21) |
|
Decouple input, munging, and output processes |
|
|
19 | (1) |
|
Design data structures carefully |
|
|
20 | (5) |
|
Example: the CD file revisited |
|
|
20 | (5) |
|
Encapsulate business rules |
|
|
25 | (6) |
|
Reasons to encapsulate business rules |
|
|
26 | (1) |
|
Ways to encapsulate business rules |
|
|
26 | (1) |
|
|
27 | (1) |
|
|
28 | (3) |
|
Use UNIX ``filter'' model |
|
|
31 | (5) |
|
Overview of the filter model |
|
|
31 | (1) |
|
Advantages of the filter model |
|
|
32 | (4) |
|
|
36 | (2) |
|
What to write to an audit trail |
|
|
36 | (1) |
|
|
37 | (1) |
|
Using the UNIX system logs |
|
|
37 | (1) |
|
|
38 | (1) |
|
|
38 | (1) |
|
|
39 | (18) |
|
|
40 | (7) |
|
|
40 | (1) |
|
|
41 | (1) |
|
|
42 | (1) |
|
|
43 | (3) |
|
The Guttman-Rosler transform |
|
|
46 | (1) |
|
Choosing a sort technique |
|
|
46 | (1) |
|
|
47 | (2) |
|
|
47 | (2) |
|
|
49 | (2) |
|
|
51 | (2) |
|
|
53 | (2) |
|
|
55 | (1) |
|
|
56 | (1) |
|
|
57 | (22) |
|
String handling functions |
|
|
58 | (2) |
|
|
58 | (1) |
|
Finding strings within strings (index and rindex) |
|
|
59 | (1) |
|
|
60 | (1) |
|
|
60 | (17) |
|
What are regular expressions? |
|
|
60 | (1) |
|
Regular expression syntax |
|
|
61 | (4) |
|
Using regular expressions |
|
|
65 | (5) |
|
Example: translating from English to American |
|
|
70 | (3) |
|
More examples: /etc/passwd |
|
|
73 | (3) |
|
|
76 | (1) |
|
|
77 | (1) |
|
|
78 | (1) |
PART II DATA MUNGING |
|
79 | (68) |
|
|
81 | (15) |
|
|
82 | (5) |
|
|
82 | (2) |
|
|
84 | (1) |
|
|
85 | (2) |
|
|
87 | (7) |
|
Converting the character set |
|
|
87 | (1) |
|
|
88 | (2) |
|
Converting number formats |
|
|
90 | (4) |
|
|
94 | (1) |
|
|
95 | (1) |
|
|
96 | (31) |
|
Simple record-oriented data |
|
|
97 | (11) |
|
Reading simple record-oriented data |
|
|
97 | (3) |
|
Processing simple record-oriented data |
|
|
100 | (2) |
|
Writing simple record-oriented data |
|
|
102 | (3) |
|
|
105 | (3) |
|
|
108 | (2) |
|
|
108 | (1) |
|
|
109 | (1) |
|
|
110 | (4) |
|
Example: a different CD file |
|
|
111 | (2) |
|
Special values for $/ |
|
|
113 | (1) |
|
Special problems with date fields |
|
|
114 | (9) |
|
Built-in Perl date functions |
|
|
114 | (6) |
|
|
120 | (1) |
|
|
121 | (1) |
|
Choosing between date modules |
|
|
122 | (1) |
|
Extended example: web access logs |
|
|
123 | (3) |
|
|
126 | (1) |
|
|
126 | (1) |
|
Fixed-width and binary data |
|
|
127 | (20) |
|
|
128 | (11) |
|
|
128 | (7) |
|
|
135 | (4) |
|
|
139 | (5) |
|
|
140 | (3) |
|
Reading and writing MP3 files |
|
|
143 | (1) |
|
|
144 | (1) |
|
|
145 | (2) |
PART III SIMPLE DATA PARSING |
|
147 | (78) |
|
|
149 | (14) |
|
|
150 | (4) |
|
Example: metadata in the CD file |
|
|
150 | (2) |
|
Example: reading the expanded CD file |
|
|
152 | (2) |
|
|
154 | (4) |
|
|
154 | (3) |
|
Limitations of regular expressions |
|
|
157 | (1) |
|
|
158 | (4) |
|
An introduction to parsers |
|
|
158 | (3) |
|
|
161 | (1) |
|
|
162 | (1) |
|
|
162 | (1) |
|
|
163 | (12) |
|
Extracting HTML data from the World Wide Web |
|
|
164 | (1) |
|
|
165 | (2) |
|
Example: simple HTML parsing |
|
|
165 | (2) |
|
|
167 | (5) |
|
|
167 | (2) |
|
|
169 | (2) |
|
HTML::TreeBuilder and HTML::Element |
|
|
171 | (1) |
|
Extended example: getting weather forecasts |
|
|
172 | (2) |
|
|
174 | (1) |
|
|
174 | (1) |
|
|
175 | (34) |
|
|
176 | (2) |
|
|
176 | (1) |
|
|
176 | (2) |
|
Parsing XML with XML::Parser |
|
|
178 | (13) |
|
Example: parsing weather.xml |
|
|
178 | (1) |
|
|
179 | (2) |
|
|
181 | (7) |
|
|
188 | (3) |
|
|
191 | (2) |
|
Example: parsing XML using XML::DOM |
|
|
191 | (2) |
|
Specialized parsers--XML::RSS |
|
|
193 | (4) |
|
|
193 | (1) |
|
|
193 | (2) |
|
Example: creating an RSS file with XML::RSS |
|
|
195 | (1) |
|
Example: parsing an RSS file with XML::RSS |
|
|
196 | (1) |
|
Producing different document formats |
|
|
197 | (11) |
|
|
197 | (1) |
|
XML document transformation script |
|
|
198 | (7) |
|
Using the XML document transformation script |
|
|
205 | (3) |
|
|
208 | (1) |
|
|
208 | (1) |
|
Building your own parsers |
|
|
209 | (16) |
|
Introduction to Parse::RecDescent |
|
|
210 | (2) |
|
Example: parsing simple English sentences |
|
|
210 | (2) |
|
|
212 | (5) |
|
Example: parsing a Windows INI file |
|
|
212 | (1) |
|
Understanding the INI file grammar |
|
|
213 | (1) |
|
Parser actions and the @item array |
|
|
214 | (1) |
|
Example: displaying the contents of @item |
|
|
214 | (2) |
|
Returning a data structure |
|
|
216 | (1) |
|
Another example: the CD data file |
|
|
217 | (6) |
|
Understanding the CD grammar |
|
|
218 | (1) |
|
Testing the CD file grammar |
|
|
219 | (1) |
|
|
220 | (3) |
|
Other features of Parse::RecDescent |
|
|
223 | (1) |
|
|
224 | (1) |
|
|
224 | (1) |
PART IV THE BIG PICTURE |
|
225 | (7) |
|
|
227 | (5) |
|
|
228 | (1) |
|
The usefulness of data munging |
|
|
228 | (1) |
|
|
228 | (1) |
|
The usefulness of the Perl community |
|
|
229 | (1) |
|
|
229 | (3) |
|
|
229 | (1) |
|
|
230 | (1) |
|
Know where to go for more information |
|
|
230 | (2) |
appendix A Modules reference |
|
232 | (22) |
appendix B Essential Perl |
|
254 | (19) |
index |
|
273 | |