In this blog post, we update on findings from the OSDI, from our problem identification and longlisting workstreams, through to sharing our final longlist of solutions.
Over the last few months, we have been investigating the challenges faced by the safety tech sector relating to data access, in order to determine opportunities to improve the quality and availability of training data for innovators.
Our research has centred on conducting extensive stakeholder interviews to define the key challenges faced by the majority of safety tech providers. This has involved nearly 70 interviews with organisations across the sector, taking into account differences relating to harm type, type of product/service offering, product/commercial maturity, type of technology, and international focus.
Based on our research we have identified 5 key macro problems faced by safety tech innovators:
- safety tech innovators do not have enough high quality data to develop their models
- safety tech providers struggle to demonstrate products’ performance
- data is inconsistently labelled across the sector due to a lack of standardised schema
- there is a lack of clear guidelines on handling and sharing online harms data
- safety tech providers struggle to fully understand evolving client needs, which limits safety tech firms ability to build more effective, market-ready solutions
Developing a longlist of solutions
Having closely defined the key user needs, we explored potential technical and non-technical solutions that could solve the challenges, either through this project or via government’s wider activities.
This involved mixed stakeholder ideation workshops with Online Safety Tech Industry Association (OSTIA) members to validate our user needs and seek potential solutions from safety tech providers. We also convened a cross-Whitehall roundtable to understand best practice across the public sector.
Our final longlist
Across our 5 challenges, we identified 23 solutions to meet user needs, ranging from generating synthetic data to improve the availability of high-quality data for certain harm types, to a universal taxonomy to encourage a standardised approach to describing online harms, and training and guidance resources to promote best practice in data sharing. The full list can be seen in the diagram below.
|1. Safety Tech do not have enough high-quality data
|Closed datasets exist but are not available for use
|Open closed datasets
|Data Repository; Trusted Research Environment; Federated Learning; Synthetic data sharing; Hash-matching database of text-based harms data; Create new access routes for existing closed databases
|Open-source data is difficult to find
|Collate open-source data
|Open data doesn't exist for certain harm types
|Collect/generate and make available new, high-quality data
|2. Safety Tech struggle to demonstrate products’ performance
|Safety tech firms struggle to accredit and benchmark products
|Create a product benchmarking / evaluation testbed
|3. Data is inconsistently labelled across the sector
|Safety Tech have to develop internal schema and relabel available datasets
|A standardised approach to describing online harms
|4. There is a lack of clear guidelines on handling and sharing online harms data
|Innovators and researcher are unclear on data handling best practice
Safety Tech providers have limited knowledge of PETs
|Create centralised training and guidance on handling online harms data
|5. Client needs are not fully understood
|Smaller safety tech firms have a limited understanding of the needs of clients.
|Improve engagement between Safety Tech and end-customers
We are currently developing a shortlisting framework to determine which ideas we take forward into the technical phase of the project. In the next blogpost, we will set out findings from our data security workstream.