We want to see how the government will make sure that AI use in government is making public services better, and not worse.
When the government is using AI that could have a big impact on us, we want to see how they plan to manage these risks. This may mean:
setting a date to check if the model is still working as expected,
deciding not to use the AI for certain tasks, or
only using the AI for low-risk parts of the work.
For example, ACC currently uses AI to automatically approve people's benefits if it's a straightforward ACC claim. The system has a very high accuracy rate and enables claim decisions to be approved in seconds. However, a human has to make the decision if a claim will be denied.
We also want to make sure that the plan is followed. Therefore, we're asking for a formal oversight mechanism -- someone who can check that the plan has been made correctly, and that the system continues to be safe. This could be within the agency for low and medium-risk uses, and through a central oversight agency like the Public Service Commission or the Government Digital Delivery Agency (or someone else) for high-risk uses.
Knowing that an AI system is being used is only half the battle; we also need to know that the necessary thinking was done before the system was deployed, and that there is ongoing monitoring to ensure it is working correctly. New Zealand currently relies on a patchwork of voluntary guidelines and internal agency policies to ensure that government AI systems are safe and fair. Most people in the public sector want to do this well. But when a machine is handed the power to influence life-altering decisions, good intentions are not a substitute for formal oversight.
We believe that oversight must be legally mandated where there is a material level of risk. We don't want to clog up the public service with red tape for low-risk administrative tools. But systems that can have a material impact on someone's life demand rigorous assessment and auditing. Formal oversight mechanisms ensure that experts can look under the hood, verify performance, and stop systems that exhibit poor performance or bias before they cause further harm.
Furthermore, if you want to challenge a decision made by an AI/algorithm, you need to know enough detail about how the system works and who is accountable for the system. The principles of natural justice demand that procedural unfairness be identified and corrected, and we can't wait for legal challenges to surface this information. We need this information to be published by default, not dependent on OIA or legal requests.
The call for formal oversight mechanisms has two parts - impact assessment while the systems are designed and developed, and ongoing governance after the systems have been deployed. This helps shift public perception from blind trust to earned confidence. This gives both the public and our public servants the reassurance that AI is being used safely, equitably, and in a way that actively upholds our rights and values. This aligns with our next idea around public deliberation.
Canada’s Algorithmic Impact Assessment Tool
Singapore’s Government Developer Portal includes a Responsible AI Playbook
The UK government’s guidance on algorithmic transparency includes specific guidance about impact assessments.
The current (complex) mechanisms for AI oversight in the NZ government are summarised in this report by Johniel Bocacao.
Under the Algorithm Charter, StatsNZ has published the Algorithm Impact Assessment developed by Frith Tweedie. Using this toolkit is voluntary, and we haven't found any completed assessments that have been published proactively in New Zealand. Either no one is doing the assessments, or we aren't being told about it.
The Algorithm Charter chapter in More Zeros and Ones calls for the government to consider adopting an Algorithmic Impact Assessment tool like Canada's.
The Public Service AI Work Programme has an item to deliver a "Public Serice AI Assurance Model". This would likely apply to agencies under the Public Service Act 2020, but there are no publicly available details about this model published yet.
We expect that the details will be debated, but here's some ideas for what formal oversight could include:
Before any tool is deployed, a standardised risk matrix can be used to determine if the level of risk is material enough to justify formal oversight.
For systems that are medium-risk, the agency could be responsible for the oversight. For systems that are high-risk, there should be independent oversight from another central government agency (such as the Public Service Commission).
Agencies would be required to add their AI system to the public register, publish a plain-language Al Impact Assessment (AIA) modelled off existing tools, and demonstrate that they have ongoing performance auditing and human oversight.
Agency Chief Executives can be held legally accountable under the Public Service Act for ensuring that medium/high-risk systems are appropriately disclosed and audited on an ongoing basis.
Ideally, a small oversight/governance unit (likely sitting within an existing Commission or agency) should be given the authority to issue compliance notices (or similar) on agencies that have demonstrably run AI systems poorly.
Nobody wants to do more work for no reason. But when you look at why some modern technology projects have failed, it is often the case that bad governance is what actually slows things down. Brakes on a high-performance car don't exist to make the car slow - they exist so you can drive fast safely without crashing.
Risk-adverse government agencies are slow to approve and adopt AI systems right now because it is not clear if they have done everything they need to do to be safe. Project teams spend months arguing internally about ethics, privacy, and public perception - delays are expensive! Having appropriate formal oversight can address that systemic anxiety and give agencies a clear, standarised path to safe deployment.
This idea acknowledges that not all technology is equally risky, and that there are different categories and types of risk. This will need to be developed further, but basic systems don't need to have the full bells and whistles. Strict audits and oversight mechanisms would be reserved for the medium and high-risk systems that inform or make life-altering decisions about us.