dual evaluation

1 month ago · ai

[Paper] DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation

Large language models (LLMs) and autonomous coding agents are increasingly used to generate software across a wide range of domains. Yet a core requirement rema...

#secure code generation #LLM benchmarking #software security #AI research #dual evaluation