내 노트북에서 실행한 Qwen3.6-35B-A3B가 Claude Opus 4.7보다 더 나은 펠리컨을 그려줬다

발행: 3주 전 (2026년 4월 17일 AM 02:37 GMT+9)

3 분 소요

Source: Hacker News

16th April 2026

Qwen 3.6‑35B‑A3B

Generated using the 20.9 GB Qwen3.6‑35B‑A3B‑UD‑Q4_K_S.gguf quantized model by Unsloth, running on a MacBook Pro M5 via LM Studio (and the llm‑lmstudio plugin).
Transcript

자전거 프레임이 올바른 형태입니다. 하늘에 구름이 있습니다. 펠리컨이 어색해 보이는 주머니를 가지고 있습니다. 바닥에 적힌 캡션은 “펠리컨이 자전거 위에!”입니다.

Claude Opus 4.7

Generated with Anthropic’s brand‑new Claude Opus 4.7.
Transcript

자전거 프레임이 완전히 잘못된 형태입니다. 구름이 없고 노란 해가 있습니다. 펠리컨이 뒤를 바라보고 있으며, 내가 원하는 것보다 덜 뚜렷한 주머니를 가지고 있습니다.

I’m giving this one to Qwen 3.6. Opus managed to mess up the bicycle frame!

Opus with `thinking_level: max`

A second attempt with thinking_level: max didn’t improve the result much.
Transcript

자전거 프레임이 완전히 잘못된 형태이지만 다른 방식입니다. 선이 더 굵어졌습니다. 펠리컨이 조금 더 펠리컨처럼 보입니다.

I don’t think Qwen are cheating

A lot of people are convinced that the labs train for my stupid benchmark. I don’t think they do, but honestly this result gave me a little glint of suspicion. So I’m burning one of my secret backup tests—here’s what I got from Qwen 3.6‑35B‑A3B and Opus 4.7 for “Generate an SVG of a flamingo riding a unicycle”. I’m giving this one to Qwen too, partly for the excellent “ SVG comment.

What can we learn from this?

The pelican benchmark has always been meant as a joke—it’s mainly a statement on how obtuse and absurd the task of comparing these models is.

The weird thing about that joke is that, for the most part, there has been a direct correlation between the quality of the pelicans produced and the general usefulness of the models. Those first pelicans from October 2024 were junk. The more recent entries have generally been much, much better—to the point that Gemini 3.1 Pro produces illustrations you could actually use somewhere, provided you had a pressing need to illustrate a pelican riding a bicycle.

Today, even that loose connection to utility has been broken. I have enormous respect for Qwen, but I very much doubt that a 21 GB quantized version of their latest model is more powerful or useful than Anthropic’s latest proprietary release.

If the thing you need is an SVG illustration of a pelican riding a bicycle, right now Qwen 3.6‑35B‑A3B running on a laptop is a better bet than Opus 4.7!

내 노트북에서 실행한 Qwen3.6-35B-A3B가 Claude Opus 4.7보다 더 나은 펠리컨을 그려줬다

Qwen 3.6‑35B‑A3B

Claude Opus 4.7

Opus with `thinking_level: max`

I don’t think Qwen are cheating

What can we learn from this?

관련 글

Monero 커뮤니티 크라우드펀딩 시스템

OpenAI 광고 파트너, 이제 ‘프롬프트 관련성’ 기반 ChatGPT 광고 배치 판매.

Show HN: Holos – QEMU/KVM과 compose-style YAML, GPU 및 헬스 체크

NASA는 아폴로 11호 우주비행사들에게 욕설을 사용하지 않도록 훈련시켜야 했다

Qwen 3.6‑35B‑A3B

Claude Opus 4.7

Opus with thinking_level: max

I don’t think Qwen are cheating

What can we learn from this?

관련 글

Monero 커뮤니티 크라우드펀딩 시스템

OpenAI 광고 파트너, 이제 ‘프롬프트 관련성’ 기반 ChatGPT 광고 배치 판매.

Show HN: Holos – QEMU/KVM과 compose-style YAML, GPU 및 헬스 체크

NASA는 아폴로 11호 우주비행사들에게 욕설을 사용하지 않도록 훈련시켜야 했다

Qwen 3.6‑35B‑A3B

Claude Opus 4.7

Opus with `thinking_level: max`