Data. It seems every network project I’ve encountered has been stuck at this stage. As a design software vendor we are very attuned to the importance of consistent and comprehensive data (software tends to complain loudly and relentlessly if data is poor), but typically my customers are already aware of the challenge. The planning team have clutter in the address set, the design team have a duct (conduit) set in three map projections, and the construction team discover the pole file simply isn’t true.

The underlying problem is an incorrect assumption. Just because the data is useful for its owner, doesn’t mean it will be useful for you.

Not simply an ETL problem

The typical approach is to treat data acquisition as an ETL activity (Extract, Transform, Load). We will extract the data from the source database, transform it into the format and structure we need, and then load into our database. The recurring quip at Biarri Networks is “But it made sense when we drew it on the whiteboard.” The solution lies in ‘intelligence’ during the Transform stage, but before solving we need to understand the problem.

Think like an Historian

Where did this data come from and how did the owner make use of it? And critical in this question is whether they had a human-in-the-loop (HiL) process when they used the data.

Let me explain by way of example.

Duplicates

Consider an address set provided by a postal service. A major irritation is when there are duplicates in the data set such as the addresses to be served. Sometimes these can be perfect duplicates such as two rows with an entry for 123 Main Street. Your data team have probably already configured the ETL to find and resolve these.

Other times the duplicate can be more subtle such as 123 Main Street and 123 Main St (note the abbreviation). Or it could be a compound addresses such as Unit 4 123-127 Main Street. From the postal service point of view, the duplicates are harmless; they know there will only be one letterbox for the mail to be delivered to. A human will resolve the ambiguity. For the fibre project it is a different matter. By not eliminating these duplicates you risk over servicing the premise at Unit 4 123 Main St with multiple fibre allocations.

Internal inconsistencies

There was never a single source data set. Just because you were handed a single file, doesn’t mean it was created that way. Mergers and acquisitions may have resulted in a mega data set that is similar in format and structure only.

A classic case is land parcels which are typically defined by polygons. While the notion of discrete parcels with discrete owners is a cornerstone of our economy, there are numerous ways to represent the information often with different jurisdictions applying their own policy. One may represent sub-divisions as two distinct polygons overlaid on the original parcel – resulting in three overlapping parcels. Others may remove the original. And in more complex subdivisions there may be the private parcels and the common owned parcels interlocking like a jigsaw, all overlaid on the original parcel. Town planners and architects will be familiar with the local policy and so from their perspective the data is consistent and usable, but if your project spans jurisdictions you should expect each to have generated their data in a different manner. Unless you can detect and resolve these you risk process churn and confusion in your team.

Missing and erroneous data

Where a third party provides infrastructure data such as poles or conduit, we frequently assume the data will be correct and complete. For instance the geometric path of the conduit will match the physical world.

This ignores the maxim “If it ain’t broke don’t fix it”. When that third party built the infrastructure they needed reasonable data, but once in operation they have little interest in maintaining it, other than when a fault occurs. So if their infrastructure is stable, and the rare times there is a fault the service agent in the field uses initiative to fill in the gaps, the third party has no incentive to maintain their data.

Unless you establish a process and culture that is tolerant to these errors and omissions you risk creating a system that spends more time in feedback and rectification than in progressing the build of your network.

Free data. Sometimes it is the gift that keeps on taking.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *