Skip to main content

A longlist of data access solutions for safety tech developers

In this blog post, we update on findings from the OSDI, from our problem identification and longlisting workstreams, through to sharing our final longlist of solutions. 

Problem identification

Over the last few months, we have been investigating the challenges faced by the safety tech sector relating to data access, in order to determine opportunities to improve the quality and availability of training data for innovators.

Our research has centred on conducting extensive stakeholder interviews to define the key challenges faced by the majority of safety tech providers. This has involved nearly 70 interviews with organisations across the sector, taking into account differences relating to harm type, type of product/service offering, product/commercial maturity, type of technology, and international focus.

Based on our research we have identified 5 key macro problems faced by safety tech innovators:

  • safety tech innovators do not have enough high quality data to develop their models
  • safety tech providers struggle to demonstrate products’ performance
  • data is inconsistently labelled across the sector due to a lack of standardised schema
  • there is a lack of clear guidelines on handling and sharing online harms data
  • safety tech providers struggle to fully understand evolving client needs, which limits safety tech firms ability to build more effective, market-ready solutions

Developing a longlist of solutions

Having closely defined the key user needs, we explored potential technical and non-technical solutions that could solve the challenges, either through this project or via government’s wider activities.

This involved mixed stakeholder ideation workshops with Online Safety Tech Industry Association (OSTIA) members to validate our user needs and seek potential solutions from safety tech providers. We also convened a cross-Whitehall roundtable to understand best practice across the public sector.

Image shows Jamboard of sticky notes in response to the question “Would access to a centralised repository of data for model training be useful to you?”. Responses are posted on sticky notes of various colours.
Workshop attendees provided feedback on potential technical solutions.

Our final longlist

Across our 5 challenges, we identified 23 solutions to meet user needs, ranging from generating synthetic data to improve the availability of high-quality data for certain harm types, to a universal taxonomy to encourage a standardised approach to describing online harms, and training and guidance resources to promote best practice in data sharing. The full list can be seen in the diagram below.

Challenge Sub-challenge User need  Solutions 
1. Safety Tech do not have enough high-quality data Closed datasets exist but are not available for use  Open closed datasets Data Repository; Trusted Research Environment; Federated Learning; Synthetic data sharing; Hash-matching database of text-based harms data; Create new access routes for existing closed databases
Open-source data is difficult to find Collate open-source data
  • Open-source data repository
Open data doesn't exist for certain harm types Collect/generate and make available new, high-quality data
  • Repository of data donations
  • Repository of harms reporting
  • Harms metadata collection
  • Synthetic data generation
2. Safety Tech struggle to demonstrate products’ performance Safety tech firms struggle to accredit and benchmark products Create a product benchmarking / evaluation testbed
  • Product evaluation standard and accreditation
3. Data is inconsistently labelled across the sector Safety Tech have to develop internal schema and relabel available datasets A standardised approach to describing online harms
  • Universal taxonomy
  • Harm-specific taxonomy 
  • Data standard
  • Federated labelling
4. There is a lack of clear guidelines on handling and sharing online harms data Innovators and researcher are unclear on data handling best practice

Safety Tech providers have limited knowledge of PETs

Create centralised training and guidance on handling online harms data
  • Industry-wide  online harms data sharing and governance toolkit
  • Data practitioners network hosting training workshops on PETs and data security relating to online harms
5. Client needs are not fully understood Smaller safety tech firms have a limited understanding of the needs of clients. Improve engagement between Safety Tech and end-customers
  • Challenge competition/data hackathons
  • Safety Tech Procurement Framework
  • B2B Marketplace for Safety Tech
  • Market engagement events
  • Public procurement guidance

Next steps

We are currently developing a shortlisting framework to determine which ideas we take forward into the technical phase of the project. In the next blogpost, we will set out findings from our data security workstream.

Sharing and comments

Share this page

Leave a comment

We only ask for your email address so we know you're a real person

By submitting a comment you understand it may be published on this public website. Please read our privacy notice to see how the GOV.UK blogging platform handles your information.