A longlist of data access solutions for safety tech developers

In this blog post, we update on findings from the OSDI, from our problem identification and longlisting workstreams, through to sharing our final longlist of solutions.

Problem identification

Over the last few months, we have been investigating the challenges faced by the safety tech sector relating to data access, in order to determine opportunities to improve the quality and availability of training data for innovators.

Our research has centred on conducting extensive stakeholder interviews to define the key challenges faced by the majority of safety tech providers. This has involved nearly 70 interviews with organisations across the sector, taking into account differences relating to harm type, type of product/service offering, product/commercial maturity, type of technology, and international focus.

Based on our research we have identified 5 key macro problems faced by safety tech innovators:

safety tech innovators do not have enough high quality data to develop their models
safety tech providers struggle to demonstrate products’ performance
data is inconsistently labelled across the sector due to a lack of standardised schema
there is a lack of clear guidelines on handling and sharing online harms data
safety tech providers struggle to fully understand evolving client needs, which limits safety tech firms ability to build more effective, market-ready solutions

Developing a longlist of solutions

Having closely defined the key user needs, we explored potential technical and non-technical solutions that could solve the challenges, either through this project or via government’s wider activities.

This involved mixed stakeholder ideation workshops with Online Safety Tech Industry Association (OSTIA) members to validate our user needs and seek potential solutions from safety tech providers. We also convened a cross-Whitehall roundtable to understand best practice across the public sector.

Image shows Jamboard of sticky notes in response to the question “Would access to a centralised repository of data for model training be useful to you?”. Responses are posted on sticky notes of various colours. — Workshop attendees provided feedback on potential technical solutions.

Our final longlist

Across our 5 challenges, we identified 23 solutions to meet user needs, ranging from generating synthetic data to improve the availability of high-quality data for certain harm types, to a universal taxonomy to encourage a standardised approach to describing online harms, and training and guidance resources to promote best practice in data sharing. The full list can be seen in the diagram below.

Challenge	Sub-challenge	User need	Solutions
1. Safety Tech do not have enough high-quality data	Closed datasets exist but are not available for use	Open closed datasets	Data Repository; Trusted Research Environment; Federated Learning; Synthetic data sharing; Hash-matching database of text-based harms data; Create new access routes for existing closed databases
	Open-source data is difficult to find	Collate open-source data	Open-source data repository
	Open data doesn't exist for certain harm types	Collect/generate and make available new, high-quality data	Repository of data donations Repository of harms reporting Harms metadata collection Synthetic data generation
2. Safety Tech struggle to demonstrate products’ performance	Safety tech firms struggle to accredit and benchmark products	Create a product benchmarking / evaluation testbed	Product evaluation standard and accreditation
3. Data is inconsistently labelled across the sector	Safety Tech have to develop internal schema and relabel available datasets	A standardised approach to describing online harms	Universal taxonomy Harm-specific taxonomy Data standard Federated labelling
4. There is a lack of clear guidelines on handling and sharing online harms data	Innovators and researcher are unclear on data handling best practice Safety Tech providers have limited knowledge of PETs	Create centralised training and guidance on handling online harms data	Industry-wide online harms data sharing and governance toolkit Data practitioners network hosting training workshops on PETs and data security relating to online harms
5. Client needs are not fully understood	Smaller safety tech firms have a limited understanding of the needs of clients.	Improve engagement between Safety Tech and end-customers	Challenge competition/data hackathons Safety Tech Procurement Framework B2B Marketplace for Safety Tech Market engagement events Public procurement guidance

Next steps

We are currently developing a shortlisting framework to determine which ideas we take forward into the technical phase of the project. In the next blogpost, we will set out findings from our data security workstream.

A longlist of data access solutions for safety tech developers

Problem identification

Developing a longlist of solutions

Our final longlist

Next steps

Share this page

Leave a comment

About this blog

Sign up and manage updates

Recent posts

Comments and moderation

Problem identification

Developing a longlist of solutions

Our final longlist

Next steps

Sharing and comments

Share this page

Related content and links

About this blog

Sign up and manage updates

Recent posts

Comments and moderation