Coordinate CSV combiner and string to float

One of my field colleagues asked me to combine three coordinate csv’s for him. Initially it was a seemingly simple task, but then there were a lot of duplicates; because the duplicates sometimes had only a duplicate name, but updated coordinates (and no way of knowing which was the most current value at this stage) I couldn’t just keep one and bin the other. I also simply couldn’t rename one ie 1000 and 1000_a or that would prolong the problem.
Another problem I had was converting extracted values to float in order to run comparisons. Yes, Python can automatically recognize and convert strings to float – but sometimes this just doesn’t work ie if Python cannot recognise the string is a float!
My code is below in the hope that it helps someone else. A few notes before you go picking my stuff apart.

  • I have no error catching, there are definitely a few places to test for exceptions. This tarted as a very small and simple script and grew from there out of necessity without a lot of planning.
  • There is probably a better way to run the comparisons, I’d be keen to hear your feedback on this.
  • My file handling could probably be optimised as well.
  • Send me some feedback here on or Facebook (on the ArcPy group) or Twitter (HopBuddyHop)

Lines 1 – 14 just set the script up

<pre>__version__ = "0.0.1"
import os

# Takes a compiled CSV of control points, removes duplicates, exports cleaned CSV.
INPUT_DIR = r"C:\Control_checker"
HEADING_COL_1 = "P"
HEADING_COL_2 = "E"
HEADING_COL_3 = "N"
HEADING_COL_4 = "EL"
HEADING_COL_5 = "CODE"

control_list = {}

The first function you come across here is the function to compare two sets of coordinates. If the eastings, northings, elevations all match exactly when rounded to 3 decimal places, the function simply returns true. It could be improved by letting the user dictate the decimal places.

def check_coords_match(orig_val, new_val):
    # Only checks to three decimal places.
    if orig_val[1] == new_val[1] and round(orig_val[2], 3) == round(new_val[2], 3) and\
                    round(orig_val[3], 3) == round(new_val[3], 3):
        return True

because we’re dealing with CSV’s, there’s going to be a lot of commas and newline characters (\n) so I created a little function to remove these from each line.

def format_the_line(line):
    # Clean out the commas and the new line at end.
    line = line.rstrip('\n')
    line = line.split(',')
    return line

This next function is where some magic happens. At this stage Python still hasn’t recognised the numbers to be numbers, it thinks they’re strings so will not automatically convert them; making them useless unless there’s a way to convert them. We start by splitting the string at the decimal point, our integer is pretty easy to create using the first element of the split array. The decimal is first converted to a float, and then, because we don’t know how many zero’s it has (ie 100.1000001 would be [100, 1000001] we cannot simply divide it by ten to make the decimal, or divide it by 100 etc. We simply use a while statement to loop through and continue dividing the number by ten, until it is no longer greater than 1. Then, we join our new number with our new decimal, and we have successfully converted a string to a float. 1000001 > 100000.1 > 10000.01 > 1000.001 > 100.0001 > 10.00001 > 1.000001> 0.1000001

reviewing the code I noticed a pretty significant issue here, if the decimal starts with a zero, (as in the number 10.001 = .001) the converter is not going to work. I’ve now fixed this using a decimal counter variable and while statement.

def format_the_string_to_number(string_to_convert):
    # Convert the string '1234.000' to the number 1234.000
    new_number = string_to_convert.split('.')
    decimal_loop = len(string_to_convert) - len(new_number[0]) - 1

    new_int = float(new_number[0])
    # If there's two integers, the second one is a decimal.
    if len(new_number) > 1:
        new_decimal = float(new_number[1])
        while decimal_loop:
            new_decimal = new_decimal / 10
            decimal_loop -= 1
        # print (new_decimal)
        # convert the second integer to decimals.
        # while new_decimal > 1.0:
        #     new_decimal = new_decimal / 10
        return new_int + new_decimal
    else:
        return new_int

The final function ties the above together, for each line in the csv being checked, it converts the easting northing and elevation to a float, it then checks to see if the point number / name exists – if it does exist – the coordinates are then compared to see if it is a duplicate point, or simply two or more points sharing the same name (no action is taken in this instance, not even renaming the point) This could be improved by not formatting the E, N, El unless absolutely necessary – ie, after it has found a duplicate point name. And, by creating a list of different coordinated points that share the same point name.

def check_csv_for_dupes(control_file_input):
    for line in control_file_input:
        line = format_the_line(line)
        line[1] = format_the_string_to_number(line[1])  # East
        line[2] = format_the_string_to_number(line[2])  # North
        line[3] = format_the_string_to_number(line[3])  # Elevation

        # Check if the same point name is already in the database.
        if line[0] in control_list:
            # If it is, see if the E, N, El, match exactly.
            if check_coords_match(control_list[line[0]], line):
                pass  # if it's an exact match, we're not going to add it again.
            else:
                # Add the duplicate, give it a temp name.
                duped = line[0] + str(1) + "_dup"
                control_list[duped] = line
        else:
            # If it's not in the db then we'll just add it. No further checks required.
            control_list[line[0]] = line

Next up we’re looping through the working folder, grabbing the CSV’s and running the duplicate checks.

# Loop through each CSV in the directory.
for x in os.listdir(INPUT_DIR):
    if x[-4:] == ".csv":
        control_file = open(os.path.join(INPUT_DIR, x), 'r')
        check_csv_for_dupes(control_file)
        control_file.close()
    else:
        pass

Finally we ask for a new version and out put the new CSV. I’m sure there’s plenty of ways to improve this part!

new_control_version = input("New version please homie: ")

# Write to the new control file.
f = open(r"C:\Control_checker\Control" + str(new_control_version) + ".csv", "a")
if raw_input("Do you want headings in the CSV? (y or n): ") == 'y':
    heading = "%s, %s, %s, %s, %s" % (HEADING_COL_1, HEADING_COL_2, HEADING_COL_3, HEADING_COL_4, HEADING_COL_5)
    f.write(heading + "\n")

for x in control_list:
    f.write(control_list[x][0] + ',' + str(control_list[x][1]) + ','
             + str(control_list[x][2]) + ',' + str(control_list[x][3]) + ',' + control_list[x][4] + "\n")

f.close()

So that’s it, if you have any comments or suggestions I’d love to hear them. The full code is below for you to help yourself to.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
__version__ = "0.0.1"
import os

# Takes a compiled CSV of control points, removes duplicates, exports cleaned CSV.
INPUT_DIR = r"C:\Control_checker"
HEADING_COL_1 = "P"
HEADING_COL_2 = "E"
HEADING_COL_3 = "N"
HEADING_COL_4 = "EL"
HEADING_COL_5 = "CODE"

control_list = {}


def check_coords_match(orig_val, new_val):
    # Only checks to three decimal places.
    if orig_val[1] == new_val[1] and round(orig_val[2], 3) == round(new_val[2], 3) and\
                    round(orig_val[3], 3) == round(new_val[3], 3):
        return True


def format_the_line(line):
    # Clean out the commas and the new line at end.
    line = line.rstrip('\n')
    line = line.split(',')
    return line


def format_the_string_to_number(string_to_convert):
    # Convert the string '1234.000' to the number 1234.000
    new_number = string_to_convert.split('.')
    decimal_loop = len(string_to_convert) - len(new_number[0]) - 1

    new_int = float(new_number[0])
    # If there's two integers, the second one is a decimal.
    if len(new_number) > 1:
        new_decimal = float(new_number[1])
        while decimal_loop:
            new_decimal = new_decimal / 10
            decimal_loop -= 1
        # print (new_decimal)
        # convert the second integer to decimals.
        # while new_decimal > 1.0:
        #     new_decimal = new_decimal / 10
        return new_int + new_decimal
    else:
        return new_int


def check_csv_for_dupes(control_file_input):
    for line in control_file_input:
        line = format_the_line(line)
        line[1] = format_the_string_to_number(line[1])  # East
        line[2] = format_the_string_to_number(line[2])  # North
        line[3] = format_the_string_to_number(line[3])  # Elevation

        # Check if the same point name is already in the database.
        if line[0] in control_list:
            # If it is, see if the E, N, El, match exactly.
            if check_coords_match(control_list[line[0]], line):
                pass  # if it's an exact match, we're not going to add it again.
            else:
                # Add the duplicate, give it a temp name.
                duped = line[0] + str(1) + "_dup"
                control_list[duped] = line
        else:
            # If it's not in the db then we'll just add it. No further checks required.
            control_list[line[0]] = line


# Loop through each CSV in the directory.
for x in os.listdir(INPUT_DIR):
    if x[-4:] == ".csv":
        control_file = open(os.path.join(INPUT_DIR, x), 'r')
        check_csv_for_dupes(control_file)
        control_file.close()
    else:
        pass

new_control_version = input("New version please homie: ")

# Write to the new control file.
f = open(r"C:\Control_checker\Control" + str(new_control_version) + ".csv", "a")
if raw_input("Do you want headings in the CSV? (y or n): ") == 'y':
    heading = "%s, %s, %s, %s, %s" % (HEADING_COL_1, HEADING_COL_2, HEADING_COL_3, HEADING_COL_4, HEADING_COL_5)
    f.write(heading + "\n")

for x in control_list:
    f.write(control_list[x][0] + ',' + str(control_list[x][1]) + ','
             + str(control_list[x][2]) + ',' + str(control_list[x][3]) + ',' + control_list[x][4] + "\n")

f.close()

Author: JR

Leave a Reply