Regular expressions with Python - working with with a CSV file and Python's re module
- Request help
- → All mini tutorials
- → All English blog posts
- → All German blog posts
- → Lab Environments
- → FreeBSD-based resources
The following assumes that you downloaded this data about German municipalities in CSV format and named the file test.csv. Assuming the below Python code is named myretest.py and the CSV file is in the same directory you would execute the script as follows:
python myretest.py
Recommended reading
Example: using Python's re module to convert German municipality keys into a partial IPv6 representation
This code uses the decimal municipality keys from the CSV file and creates a 48 bit hex representation by adding the necessary amount of padding. The output may be used as the host part of an IPv6 address.
import csv, re
def main():
with open("test.csv", "r", encoding="iso8859_15", newline="") \
as csvfile:
dialect_sniffed = csv.Sniffer().sniff(csvfile.read(4096), ";")
csvfile.seek(0)
fieldnames = ["date", "g-key", "name", "area-size", "garbage"]
contents = csv.DictReader(csvfile, fieldnames, \
"garbage", "FILLER", dialect_sniffed)
for row in contents:
# {2,2} 16 Bundesland
# {3,3} 35 Statistische Region, Regierungsbezirk, Lüneburg
# {5,5} 471 Landkreis, Kreis, kreisfreie Stadt, Stadtkreis
# {8,8} 13391 Gemeinde
m = re.search("^[0-9]{1,8}$", row["g-key"])
if m != None:
# print(row["g-key"], row["name"], row["area-size"])
padded_hex_g_key = \
str(hex(int("{:9>14}".format(row["g-key"]))))
ipv6_host_part = re.findall("....", padded_hex_g_key[2:])
print(":".join(ipv6_host_part))
if __name__ == "__main__":
main()
print("Script ran standalone and was not imported.")