Mobility Data Methodology and Analysis is a short but detailed description of the methodology followed by Minnesota to manage and analyze data collected as part of a motorized scooter pilot program. The focus is on how they protected privacy and minimized any potential use or release of sensitive information through anonymization and aggregation.
The license agreements between the city and scooter operators prohibited the city from obtaining any personally identifiable information (PII) and required that service providers put in place good security practices to protect any PII that they collected as part of their operations. The agreements also laid out the city’s purpose in collecting the data, how the data was to be provided, what data the city would make publicly available, and what data each provider had to make available to the public. The methodology 1 was developed to be consistent with the Minnesota Government Data Practices Act (https://www.house.leg.state.mn.us/hrd/pubs/dataprac.pdf).
The city used a Python front-end and a Microsoft SQL server to consume and store the data. Server access was restricted, as was access to the API authorization tokens. Analysis and visualizations were done using Python, R, and Tableau.
Although no PII data was collected by the city, location specific trip level data was collected, and this data is potentially re-identifiable. The report describes the methods used by the city to minimize this possibility. The paper describes the seven techniques were used to anonymize the data, including:
- Processing all incoming API data in memory using Python. No raw data was stored, only anonymized data.
- The trip IDs sent from MDS, while already hashed into a unique value intended for anonymization, were discarded and a new ID generated to make it more difficult to link back to the providers’ data.
- Trip times and locations were binned, and the original trip times and locations discarded.
The report describes several specific issues that arose, such as differing interpretations of standards, the absence of historical data in GBFS (the project did not initially use MDS), and bad data. The project began using only GBFS data. The MDS standard became available mid-way through the project, and this was incorporated into the data reporting requirements. The specific MDS and GBFS data fields used in the pilot are provided and discussed in the two appendices to the document.
- 1Minneapolis city government. (n.d.). Mobility Data Methodology and Analysis. Minneapolis: Minneapolis, MN.