The city of San Diego is taking a step toward publishing more than 100 sets of data online, ranging from all of the places fire hydrants have been knocked over to how much revenue comes from each parking meter to the number of seals at the La Jolla Children’s Pool.
On Monday, the city released a list of 115 data sets it plans to start making public in July. The list, along with information on how the city is using the data and how the list was created, is on a new website, datasd.org.
The website also allows the public to vote on what data sets should be published first. That publication will begin July 1 and will continue over the next five years, said Almis Udrys, the city’s director of performance and analytics.
He said he hopes the publication of databases empowers residents to get more information, encourages start-up companies to find uses for the data and increases civic engagement.
“Ultimately this data belongs to the taxpayers, and I know (Mayor Kevin Faulconer) feels strongly about that,” Udrys said. “We want them to feel connected and feel like our government is helping them accomplish whatever task, whether it’s starting a business, seeing when their streets are going to get paved, help them accomplish what they’re looking to accomplish.”
On the website, users can scroll through all of the data sets that will be made public and vote on what they’d like to see first. However, that vote will only be one consideration the city uses in deciding the release order. Udrys said other factors will be data the mayor or City Council wants to be published, data sets that are frequently requested under the Public Records Act and how much work needs to be done on the data set to get it into a format useable by the public.
He said he doesn’t yet know what data sets will be published first, but he hopes public input will help the city prioritize.
Residents can also use the website to suggest data sets they don’t see on the list.
“If you don’t see a data set that’s in here that you either think we have or that you would like us to have, you’ll be able to submit that information as well,” he said. “That will help guide us on future data sets that we either look for or try to create.”
Related Story: Tough To Get Data From San Diego’s Open Data Initiative
The list was a year in the making, and began soon after San Diego’s open data policy was passed in December 2014. That policy called for the city to take an inventory of all available data sets and publish every data set that could be made public online. It also called for publishing the list of public data sets and hiring a chief data officer to oversee the process — the city has hired Maksim Pecherskiy and an open data coordinator, Andrell Bower.
The city denied an earlier Public Records Act request for the list of data sets because the list was not complete, Udrys said.
He said every city department released to his office more than 2,000 data sets, covering everything they thought could possibly be data. Pecherskiy and Bower had to go through all of it to ensure that each was actually a data set — not, for example, a Word document or something that could not be quantified — and that it did not include data sets that would not eventually be made public. Those include personnel records, anything with personal information about residents, or data that the city deems a security risk.
Udrys said the city also could not release just the list, not the data itself, because the city’s IT department had concerns over cybersecurity.
“When hackers are exploring how to attack a network, they’ll look for names and versions of specific technologies,” he said. “Sometimes those can be hinted and can even be in a data set name. We wouldn’t want to have this information out there and make our software vulnerable to any type of attack.”
While the city initially identified more than 2,000 data sets, it’s only publishing a list of 115 for now because those are the data sets that are ready to be made public, Udrys said. He expects the list will grow as more data is deemed ready for public release.
In a report to the City Council in September, he said about half of the 2,000 data sets are likely to eventually be made public. Urdys told KPBS those data sets will continue to be published over the next four years. All of the city’s data that can be made public will be published by Jan. 1, 2020.
Peter Scheer, executive director of the First Amendment Coalition, said he’s glad to see the list will be made public and said the city’s explanation is plausible that releasing an unfinished draft with data sets that had not been scrubbed was a security concern.
He said when the city denied the Public Records Act request, “it looked they might be inventing an excuse to not release it period.”
“It sounds like that is not the case, so I applaud them for it,” he said.
Xavier Leonard, a spokesman for the open government advocacy group Open San Diego, said while he wants more public data, he’s also not disappointed that the city is starting with 115 data sets.
“It’s part of an ongoing process,” he said.
He said the release of the list is “a very big deal” because it marks the midway point in the larger goal.
“That includes having identified all the data sets and within those targeting the ones that are high value, and then giving the opportunity for citizens to help specify what ones they’d like to release first,” he said.
The website datasd.org will also make its own code available on the collaborative software website GitHub.com. Urdys said he hopes that will allow San Diego coders to fix bugs on the website and make suggestions to improve it.
Udrys acknowledged the city of San Diego has lagged behind other cities in making its data open, but said he hopes that will now change.
“I think we’re going to be a nationally recognized leader in this movement,” he said. “It’s pretty easy to put up a flashy website or a flashy picture and say you’re doing open data. It’s quite another thing to be empowered to do it well, do it right and make sure it’s information people can trust.”