News'n'Clues - TECHNOLOGY Article Summaries

Total lunar eclipse will turn the full moon red — but will the skies be clear enough to see it?

Source: {'href': 'https://www.geekwire.com', 'title': 'GeekWire'} Published: 2026-03-02 18:31:58

Tech Moves: HP director departs; AWS leader leaves; Truveta adds new leaders

Source: {'href': 'https://www.geekwire.com', 'title': 'GeekWire'} Published: 2026-03-02 18:20:12

— Elizabeth Scallon, a longtime leader in Seattle's startup ecosystem, has left HP after serving for nearly four years as director of technical and business incubation and strategy. “At HP, I had the privilege of diving deep into technologies ranging from microfluidics and chip cooling to edge systems, security silicon, collaboration platforms, biometrics, authentication, and computer vision. She was director of the UW's CoMotion Labs for five years and co-founded Find Ventures, an investment firm that emphasized equitable access to capital. Blandy's position was based in Santa Monica, Calif. Past roles include executive leadership at Walt Disney, Fox and Hulu. The early stage Seattle startup is building tools for AI-native teams where humans and coding agents work side-by-side. In April 2025, Brace left Remitly as executive vice president of consumer product to take a sabbatical. — Edo co-founder and former director of strategy Courtney Blodgett has left the Seattle-based energy software company. “I've had the privilege of helping grow an idea into a company delivering demand flexibility and customer support to utilities and 7,000+ buildings across the country,” Blodgett said on LinkedIn. — Seattle-area health data company Truveta hired a slate of new employees, including multiple senior leadership positions: The company in January named Dr. Johnathan Lancaster as it president and chief scientific officer. — Sustainable tech startup Bayou Energy named Yoon Loon Wong (Andrew) as chief of staff. — Brian Hansford is senior vice president of marketing at the National Cybersecurity Alliance, a Seattle-based nonprofit supporting cybersecurity education and safety for individuals and businesses. Other past roles include leadership at LiveRamp, Icertis, MediaPRO and others. — Seattle-based Scott Schliebner is chief operating officer at P1 Trials, a startup that describes itself as “a network of world-class, community-based oncology investigative sites capable of performing complex Phase 1 clinical trials.” — After more than four years, Rob Moore left his role as vice president of order-to-cash transformation at Seattle payment tech company Remitly. It was my honor to fight alongside the ‘good guys' at Remitly day in and day out, on behalf of our resilient and inspiring customers,” Moore said on LinkedIn. Remitly co-founder Matt Oppenheimer last month announced that he is stepping down as CEO after nearly 15 years. — Manisha Arora was promoted to vice president of the California-based cloud company ServiceNow. Arora, who works in the company's Kirkland, Wash. offices, has been with ServiceNow for nearly 10 years. She was previously at Microsoft for more than a decade in program management roles. — Monod Bio, a Seattle biotech company performing computational protein design, named Robert Bujarski to its board of directors. Bujarski previously served as EVP and chief operating officer at QuidelOrtho Corporation for 20 years. — Fred Hutch Cancer Center announced 12 recipients of the Harold M. Weintraub Graduate Student Award, named after the molecular biologist who helped establish Fred Hutch's Basic Sciences Division and died of brain cancer in 1995. Great teams still win, and finding them is harder than ever. Together, we help employers cut through the noise and hire smarter, faster. Learn more about GeekWork: Contact GeekWire co-founder John Cook at [email protected]. Tech Moves: Zillow names CPO; AWS leader retires; Microsoft hires AI expert from Apple Head of Amazon's AGI lab is leaving in latest exit from high-profile Adept startup deal Tech Moves: Code.org has a new leader; Synapse vet joins Amazon; ex-Tableau CEO lands at Code Metal Find Ventures shuts down program that helped Washington startups land non-dilutive funding Tech Moves: Truveta hires president; Veeam bolsters exec lineup; Fortive names HR chief; Heard taps CTO Tech Moves: Truveta co-founder joins GeneDX; Auger hires another Amazonian; Microsoft board change Tech Moves: Read AI adds product VP; Microsoft legal chief steps down; Truveta co-founder departs; Marchex CEO is out

Bay Area-based Aetherflux to join Seattle space race with new hub for satellite development

Source: {'href': 'https://www.geekwire.com', 'title': 'GeekWire'} Published: 2026-03-02 18:17:14

Aetherflux, a Bay Area-based space startup, is expanding to Seattle to open what it calls a “core center for satellite development.” In a post on LinkedIn last week, Aetherflux said its team is growing and the company is currently hiring across all disciplines, from engineering to operations. Founded in 2024 by CEO Baiju Bhatt, the billionaire co-founder of the trading platform Robinhood, Aetherflux is currently focused on creating an orbital data center satellite. The company says the goal is for its constellation of satellites — which it calls “Galactic Brain” — to leverage solar power in space to address the massive energy needs on Earth for artificial intelligence. SpaceX, which produces satellites for its Starlink broadband constellation from its Redmond, Wash., facility, is seeking approval from the Federal Communications Commission for its plan to put up to a million satellites in orbit to process data for artificial intelligence applications. AI companies have been considering the idea of using solar-powered data center satellites to get around the limiting factors for ground-based facilities, such as rapidly growing requirements for electrical power as well as the availability of water for cooling systems. The elephant in the room is that our current energy plans simply won't get us there fast enough,” Bhatt said in December. The round was led by Index Ventures and Interlagos, with participation from Bill Gates' Breakthrough Energy Ventures, Andreessen Horowitz, and NEA, according to TechCrunch. Based in San Carlos, Calif., and Washington, D.C., Aetherflux's team includes people who've worked at Robinhood, SpaceX, NASA's Jet Propulsion Laboratory, Anduril and the U.S. Navy. Aetherflux has attracted to attention from the U.S. military. The Seattle-area company that comes closest to Aetherflux's target market is Redmond-based Starcloud, which is working to put a network of data center satellites in orbit. In a LinkedIn post, Starcloud CEO Philip Johnston hailed Aetherflux's Seattle plans as a positive sign for the region's space industry. … Did we kick off a new trend for space startups?” Johnston wrote. Have a scoop that you'd like GeekWire to cover? Startup radar: 5 early stage companies in Seattle that are raising cash and gaining traction Startup radar: Early stage companies use AI to boost coaching, data analysis, healthcare, and more Lumen Orbit changes its name to Starcloud and raises $10M for space data centers

In new letter to governor, Seattle tech leaders say income tax proposal will hurt region's AI innovation

Source: {'href': 'https://www.geekwire.com', 'title': 'GeekWire'} Published: 2026-03-02 17:50:46

Seattle tech leaders are warning that a new income tax proposal could stall the region's momentum in artificial intelligence. “These policies would materially undermine Washington's ability to keep growing the tech sector, which is a core driver of our economy, and would slow the AI innovation and investment momentum that we should be accelerating, not discouraging,” the letter reads. Citing Silicon Valley Bank's recent State of the Markets report, they say Seattle has seen a “significant” downturn in startup formation over the past three years, while San Francisco benefits from a deeper AI ecosystem and Texas is attracting companies with what they describe as a more favorable tax climate. The report shows that VC-backed company formation in Seattle has fallen 30% since 2022. Signatories of the letter include Pedro Domingos, professor emeritus of computer science and engineering at the University of Washington; Brian Hall, a former executive at Microsoft, Amazon Web Services, and Google; Oren Etzioni, former CEO of the Allen Institute for Artificial Intelligence; Read AI co‑founder and CEO David Shim; CloudMoyo CEO Manish Kedia; Founders' Co‑op general partner Aviel Ginzburg; AZX CEO Aaron Goldfeder; LaunchDarkly CTO Cameron Etezadi; Salesforce engineering leader Paul Brown; AJW Services managing director Adam Wray; and longtime software engineer and author Vijay Boyapati. The tax would take effect in 2030 and is expected to generate an estimated $3.7 billion annually. Others in Seattle's tech ecosystem have pushed back on the idea that higher taxes on top earners would trigger an existential threat to the startup economy. Meanwhile, many large tech employers are cutting thousands of jobs. University of Washington scientists and students are using AI to create real medicines. Better treatments for cancer, autoimmune diseases, viruses and more are now on the horizon thanks to groundbreaking work with artificial intelligence from a team of scientists at the University of Washington's Institute for Protein Design. Led by Nobel Prize winner David Baker, this team of Huskies uses AI tools to create proteins — biology's building blocks — that lay the foundation for new medicines. The institute's recent breakthroughs — including an antivenom for snakebites, and antibiotics that combat drug-resistant bacteria — show how this innovative science can save and change lives. Click for more about underwritten and sponsored content on GeekWire. Filings: How Amazon's $50B OpenAI deal actually works, and what they're keeping secret Amazon invests $50B in OpenAI, deepens AWS partnership with expanded $100B cloud deal Read AI rolls out ‘Digital Twin' that can respond to work emails and schedule meetings ‘I'm tired of that narrative': Seattle VC pushes back on tech exodus talk Washington's ‘millionaires tax' targets top earners as tech leaders warn of startup fallout Opinion: The ‘millionaires tax' is not an existential threat to Washington's startup economy

A married founder duo's company, 14.ai, is replacing customer support teams at startups

Source: {'href': 'https://techcrunch.com', 'title': 'TechCrunch'} Published: 2026-03-02 15:14:36

Founder Summit 2026 in Boston: Don't miss ticket savings of up to $300. The customer service industry is in a bit of flux, thanks to AI. Investors and corporate leaders have rung alarm bells for the BPO (Business Process Outsourcing) industry. On the other hand, AI-powered customer support startups such as Decagon, Parloa, and Sierra have picked up millions of dollars in funding from venture capitalists. 14.ai, a Y Combinator-backed startup, is taking an approach of building an AI-native agency that has replaced legacy customer support teams at many startups it has worked with. The company has raised $3 million in seed funding led by Y Combinator, with participation from General Catalyst, Base Case Capital, SV Angel, and the founders of Dropbox, Slack, Replit, and Vercel. The startup was founded by a married duo, Marie Schneegans and Michael Fester. The two met in Paris more than a decade ago and went on to build separate companies. Schneegans was a co-founder at corporate intranet company Workwell. Fester previously founded Snips, a company that worked on local first assistants for smart devices, which was acquired by Sonos in 2019. The duo picked up customer service as the problem to tackle, but didn't want to build a pure-play SaaS company. They founded 14.ai to operate as an AI-native customer support agency of sorts. We combine software and services in one package. The company said it can integrate with a support system within a day and start clearing the support ticket backlog very quickly. It can monitor tickets across various channels, including email, calls, chat, TikTok, Facebook, Telegram, and WhatsApp. His team of customer service agents was in the Philippines, and they were not being able clear tickets efficiently. We took over on Thursday morning, and by Thursday afternoon, we had cleared tickets from all channels like social media, SMS, email, chat, and voice,” Schneegans said. The startup learns workflows of customer support and other functions, such as sales and revenue growth, and tries to automate tasks through its software so humans have to spend less time on particular issues. “We are not just a support agency, but also a revenue growth engine because we capture all kinds of conversations early on for a client and get insights from them,” Fester said. The company wants to take off three key items from a startup's balance sheet, including ticketing systems, AI software add-ons, and human labor costs. The startup caters to many clients in different sectors such as luxury skin care brand Yon-KA, smart glasses maker Brilliant Labs, and lighting company Creative Lighting. The startup also wants to improve its own product by experimenting and letting AI handle most tasks. Tom Blomfield, a partner at Y Combinator, thinks that 14.ai strikes the right balance between using AI and humans for customer service. With the existing platforms, the customer is left to handle round after round of painful headcount reductions,” he told TechCrunch over email. “In contrast, 14.ai becomes the customer service department, both AI and human. They can reassign customer support agents between customers who are at different stages of the AI adoption journey, and carry out that load balancing much more effectively,” he added. Why did Netflix back down from its deal to acquire Warner Bros.? India disrupts access to popular developer platform Supabase with blocking order Jack Dorsey just halved the size of Block's employee base — and he says your company is next An accountant won a big jackpot on Kalshi by betting against DOGE It's time to pull the plug on plug-in hybrids

The First Trailer for ‘Scary Movie 6' Is Here to Eat Horror's Lunch Again

Source: {'href': 'https://gizmodo.com', 'title': 'Gizmodo'} Published: 2026-03-02 14:30:31

Apple Announces iPhone 17e With the One Feature the iPhone 16e Should Have Had

Source: {'href': 'https://gizmodo.com', 'title': 'Gizmodo'} Published: 2026-03-02 14:26:23

All products featured here are independently selected by our editors and writers. If you buy something through links on our site, Gizmodo may earn an affiliate commission. Kickstarting its “big week,” Apple has just announced the iPhone 17e. For the money, it's to be expected that the iPhone 17e won't have the same features as the iPhone 17, which remains an incredible value nearly six months after launch. So let's get right to what is new about the iPhone 17e compared to the iPhone 16e. That includes the same 60Hz refresh rate. There were rumors that the iPhone 17e might ditch the notch for the Dynamic Island, but that didn't happen. The screen on the iPhone 17e, however, is more scratch-resistant and has reduced glare thanks to Ceramic Shield 2, according to Apple. The three main upgrades that you can't immediately see just by looking at internet photos of the iPhone 17e? The iPhone 16e was rightfully criticized for not supporting Apple's magnetic charging and attachment system—now the iPhone 17e does. The MagSafe system also means you can attach wallets and other magnetic accessories to the back of the device. Now, for the same $599, you get 256GB of storage. In this economy and during these weird flash storage-challenged times, that's a really good deal. More performance for stuff like running iOS 26's Liquid Glass, 3D gaming, mobile video editing, and handling future software updates. For basic app things, you're unlikely to notice any dip in performance, but for heavier tasks—like gaming—that lean on the GPU, you will probably see a small drop. There are also two other smaller internal upgrades to the iPhone 17e: a C1X modem—the same one introduced in the iPhone Air—that's up to two times faster for 5G cellular connections and satellite connectivity (for Messages and Emergency SOS). The rear still has one 48-megapixel shooter with a 2x optical-quality “telephoto” digital lens. The selfie camera on the front is also the same 12-megapixel sensor as before; it's not the new square-shaped 18-megapixel Center Stage camera that lets you shoot horizontal photos and videos while holding the phone vertically (and vice versa) that's found on the iPhone 17, iPhone 17 Pros, and iPhone Air. The iPhone 17e will be available on March 11 in three colors: black, white, and soft pink. As I said earlier, a 256GB model will cost $599. A version with 512GB will cost $799. Explore more on these topics Subscribe and interact with our community, get up to date with our customised Newsletters and much more. Don't call it a recession indicator, but leaks say Apple is releasing low-end iPhones and MacBooks and expects them to be popular. Or another ultra-thin phone, either.

MyFitnessPal has acquired Cal AI, the viral calorie app built by teens

Source: {'href': 'https://techcrunch.com', 'title': 'TechCrunch'} Published: 2026-03-02 14:00:00

Founder Summit 2026 in Boston: Don't miss ticket savings of up to $300. After deal talks lasting almost a year, MyFitnessPal has successfully acquired its up-and-coming rival Cal AI. Cal AI is the AI calorie counting app startup built by two high school teenagers that soared to over 15 million downloads and over $30 million in annual revenue in under two years, MyFitnessPal tells TechCrunch. The Cal AI team of seven employees, including its co-founder CEO Zach Yadegari (pictured, above), plus a small team of contractors, have been retained by MyFitnessPal (MFP), according to MyFitnessPal CEO Mike Fisher. The Cal AI app will remain independent, with its same ease-of-use mission: estimating calories by taking pictures of food. One upgrade for Cal AI users has occurred already since the deal closed in December: The AI app has now been integrated with MFP's huge nutrition database. That database spans 20 million foods, 68,500 brands, and meals served at 380+ restaurant chains. With that $30 million revenue number, we can make an educated guess that this was a good outcome for the now 19-year-old co-founders, Yadegari, and his high school friend Henry Langmack. “They definitely caught our eye, I would say, early last year, and we have been talking to them ever since, on and off.” “They got a lot of media attention because they're pretty young, and it's easy to dismiss,” he said, “You have a conversation with them, like I did late spring last year, and you walk away saying this is an impressive young man.” For instance, Cal AI's regular stand-up meeting occurs on Sunday night. Because the founders are still in school, Yadegari works all weekend on his startup and his team is dedicated enough to join him on Sundays for a weekly check in. “So it's small, small details like that, that when you put them together, you say, this is someone who's not doing this as a hobby,” Fisher said. Four years is a pretty industry-standard term, often tied to payouts, though again, he wouldn't comment on it, even when pressed. We do, however, know that Yadegari is still running the app, now as a unit of MFP, while attending college. He told TechCrunch at the time that he hadn't intended on going to college at all and instead wanted to focus on his company. Fisher said MFP currently has no plans at the moment to integrate the app into its main product, such as replacing MFP's current photo-meal scan feature, nor to peel Cal AI users away. So, take a picture of your meal, we both do it,” Fisher said. But if MFP users take a picture of a hamburger, they can fine-tune the inputs right down to specifying three pickles, not two. Why did Netflix back down from its deal to acquire Warner Bros.? India disrupts access to popular developer platform Supabase with blocking order Jack Dorsey just halved the size of Block's employee base — and he says your company is next An accountant won a big jackpot on Kalshi by betting against DOGE

Parade's Cami Tellez announces new creator economy marketing platform, $4M in funding

Source: {'href': 'https://techcrunch.com', 'title': 'TechCrunch'} Published: 2026-03-02 14:00:00

Founder Summit 2026 in Boston: Don't miss ticket savings of up to $300. Late last year, Parade announced it was officially closing its doors. But it turns out Parade was just the beginning of Tellez's journey as a founder. On Monday, she and former TikTok executive Jon Kroopf announced the launch of the influencer marketing platform Devotion, which they said will help large brands to run and manage their influencer programs. Right now, many of these brands have human teams juggling existing influencers and discovering new ones. It's a tedious task, often bogged down by how fast this space moves. “That model hasn't worked.” Citing a 2025 IAB report showing that creators still account for about 2% of ad spend, she added, “The issue isn't belief in creators, it's unlocking the high-scale model that works in a content-based algorithm.” Devotion automates parts of this process, using AI to help brands scale their creator discovery, management, and content workflows. “There are no rogue-like agents that operate independently of a human review,” Kroopf told TechCrunch. Devotion works with brands on tasks such as analyzing influencers' posts and captions to ensure they are within company guidelines; it helps brands decide which posts to share and boost; and it can provide a brand fit score indicating how well a creator aligns with the brand's ethos. It also helps brands pay creators, which would be difficult to manage if the responsibility lay solely with humans, Kroopf said. “It's all about high-scale creator ecosystems,” Tellez, the company's creative director, said. “We're leveraging technology to open up what we think is a new opportunity, where there hasn't been a lot of attention paid from the space thus far, because it just wasn't feasible,” Kroopf said, adding that previously, it wasn't cost-effective for a brand to dedicate so much money and resources to building a platform like this for themselves. “In 2019, when I started Parade, there was no real kind of software that allowed you to really engage ambassadors [influencers] at scale,” Tellez said. Five years ago, for example, she said, a creator could make a post, and it would reach about 20% of their audience; today, that number is closer to 2%. “We're entering into a new paradigm where influence has been democratized.” As a result, brands need to operate like content networks and work with hundreds, no, thousands of influencers a month if they want to create content that can drive scale, Tellez said. There are other creator economy agencies similar to this, like Pearpop. Tellez said Devotion's fresh capital will be used to hire more engineers and brand operators to build out more of the company's tech stack. There are plans to build more AI agents soon, though nothing can be announced yet, they said. Overall, Tellez said she thinks brands are still looking for authentic ways to connect with real people, working with people from across the spectrum (not just the most famous) to get brand messaging across. “We are already seeing the consensus shift towards our vision for scaled creator ecosystems for even the world's largest and traditionally most risk-averse brands,” Tellez said. “They don't want to get caught behind the algorithm. Dominic-Madori Davis is a senior venture capital and startup reporter at TechCrunch. Why did Netflix back down from its deal to acquire Warner Bros.? India disrupts access to popular developer platform Supabase with blocking order Jack Dorsey just halved the size of Block's employee base — and he says your company is next An accountant won a big jackpot on Kalshi by betting against DOGE

Cave-inspired 3D printed Japanese home touted as earthquake resistant — two-story house first of its kind to be granted seismic compliance certificate

Source: {'href': 'https://www.tomshardware.com', 'title': "Tom's Hardware"} Published: 2026-03-02 12:41:15

Most of the structure was 3D printed using COBOD's BOD2 construction printer. When you purchase through links on our site, we may earn an affiliate commission. Get Tom's Hardware's best news and in-depth reviews, straight to your inbox. Construction of the first government-approved two-story 3D printed home has been completed in Japan. COBOD, which claims to be the “world leader in 3D construction printing solutions,” said its 3DCP system was used for this “cave-inspired” architectural wonder. The actual on-site home building/fabrication work was executed by Kizuki Co., Ltd in Kurihara City, Miyagi Prefecture, Japan. Meanwhile, the eye-catching design of this two-story dwelling was likely steered by project collaborator Onocom, an architectural services company. Seeing a government-approved two-story 3D printed reinforced concrete house completed here confirms that 3D construction printing is ready for projects that rely on structural precision and consistent quality, also in seismic areas,” commented Henrik Lund-Nielsen, Founder and General Manager of COBOD International. “The collaboration demonstrates how our technology handles complex geometry, varying climate conditions, and strict regulatory standards.” Indeed, this house was built from the ground up, relying heavily on a single 3DCP. Onocom notes (machine translation) that 3D printed buildings have typically been limited to small-scale or single-story structures. ‘Multifunctional wall' segments, “molded in one step to create a three-layer structure that integrates design, structural frame, and facility space,” are said to drastically reduce on-site post-processing. High environmental temperatures “shortened bucket life and required careful process control,” but these hurdles didn't interrupt or impair the 3D printer-driven building process. Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds. Get Tom's Hardware's best news and in-depth reviews, straight to your inbox. Tom's Hardware is part of Future US Inc, an international media group and leading digital publisher.

Some Alleged Polymarket Insiders Made a Fortune on U.S. Strikes on Iran

Source: {'href': 'https://gizmodo.com', 'title': 'Gizmodo'} Published: 2026-03-02 10:30:59

A Polymarket user made roughly half a million dollars in one day after betting on the timing of the U.S. strikes on Iran. Software company Bubblemaps said in an X post on Saturday that it had identified six crypto wallets on Polymarket that made a total of $1.2 million by betting that the U.S. would strike Iran before February 28. “Prediction markets cannot be a vehicle for profiting off advance knowledge of military action,” Levin wrote in the post. The common argument posed by prediction market fans is that insider trading is pretty much the entire point, as any insider action could be used as a signal of news before it actually drops. Opponents to that argument say that using non-public information to make money on bets can be unfair or potentially fraudulent, and allowing it only stands to make rich and powerful insiders richer and more powerful. Not to mention that betting on war, like the one currently unfolding in the Middle East that has already claimed the lives of more than 200 people, is a pretty pure example of profiting off of human suffering. Polymarket started getting significantly more heat for the practice when a brand new account made more than $436,000 in January, betting on the Venezuelan President Nicolás Maduro's downfall, just hours before his capture by U.S. forces was made public knowledge. In a statement on the platform, Polymarket defended its decision to continue to allow betting on the war in the Middle East, arguing that prediction markets “create accurate, unbiased forecasts” that are “invaluable in gut-wrenching times like today.” “After discussing with those directly affected by the attacks, who had dozens of questions, we realized that prediction markets could give them the answers they needed in ways TV news and X could not,” Polymarket claims. Once a banned platform in the U.S., Polymarket made its grand return after newly elected President Trump dropped DOJ and Commodity Futures Trading Commission investigations into the company and cleared a path to legality for betting markets. But while these platforms are gaining credibility in the United States, some politicians are still working to at least limit operations. At the state level, regulators in states like Nevada and New York are trying to limit political and sports betting on the platforms. At the federal level, the House has yet to vote on a bill that would ban federal officials from betting on policy outcomes. Earlier this week, six Democratic senators sent a letter to the CFTC asking it to “categorically prohibit” prediction markets from offering contracts “that incentivize physical injury or death” by resolving the bet based on an individual's death. Some bets on the platform regarding Iran's former supreme leader Ayatollah Khamenei‘s removal by U.S. forces have since been resolved to “no” following his death. The digital equivalent of snatching coins from someone's outstretched hands. It highlights the perils of "trusted third parties"—and reminds us what the whole point of decentralized protocols was to begin with. Someone clearly wants the public to believe Trump will speak for more than two hours.

AMD details Ryzen AI 400 desktop with up to 8 cores, Radeon 860M graphics — APUs won't be available as boxed units, only in OEM systems

Source: {'href': 'https://www.tomshardware.com', 'title': "Tom's Hardware"} Published: 2026-03-02 08:00:00

When you purchase through links on our site, we may earn an affiliate commission. Get Tom's Hardware's best news and in-depth reviews, straight to your inbox. After teasing desktop Ryzen AI 400 processors at the beginning of the year, AMD has finally provided details on its new (but slim) desktop product stack. AMD is offering two variations of the processors, one with the PRO designation for enterprise and another without it, but neither will be available as boxed retail units. At this time, they'll only show up in OEM systems. The desktop lineup features three processors and six total SKUs. The top-end Ryzen AI 7 450G comes with eight Zen 5 cores, 16 threads, a boost clock of 5.1 GHz, 24MB of cache, and Radeon 860M graphics with eight RDNA 3.5 CUs. AMD is using a 65W TDP for these chips, and the 35W versions are noted with an “E” suffix (i.e. Ryzen AI 7 450GE). As with all Zen 5 chips, Ryzen AI 400 desktop CPUs slot into the AM5 socket. Although the silicon is identical, AMD is only pushing out the bottom rung of its Gorgon Point lineup on desktop right now. On mobile, AMD climbs up to the Ryzen AI 9 HX 475, which features a 60 TOPS NPU, Radeon 890M graphics with 16 RDNA 3.5 CUs, and 12 cores that can boost up to 5.2 GHz. AMD hasn't made any performance claims about the desktop chips yet, which isn't surprising given this is a new category of product for Team Red. AMD will only offer these APUs in OEM systems for now. They come with Copilot+ certification, which calls for more than just an NPU. Critically, Copilot+ calls for at least 16GB of system memory, which is a variable AMD can't control with boxed retail units. For now, AMD says commercial designs with these chips will be available in Q2 2026. Get Tom's Hardware's best news and in-depth reviews, straight to your inbox. In total, AMD says it will have over 200 commercial designs available with its PRO chips, but that includes mobile offerings as well. Some of the OEMs AMD is working with include Acer, Asus, Dell, HP, and Lenovo. As you can see in the slide above, AMD is featuring smaller desktop designs, which is likely where we'll see Ryzen AI 400 desktop chips in action. AMD includes additional features, like a multi-layer security ecosystem and manageability for IT administrators. We should see designs with these CPUs roll out shortly. We've also asked about the fate of the long-rumored Ryzen 9000G APU lineup, though we don't expect much news on that front at this time. Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds. Jake Roach is the Senior CPU Analyst at Tom's Hardware, writing reviews, news, and features about the latest consumer and workstation processors. Tom's Hardware is part of Future US Inc, an international media group and leading digital publisher.

Where Humane Failed, Qualcomm Imagines the Future Is Filled With AI Pins

Source: {'href': 'https://gizmodo.com', 'title': 'Gizmodo'} Published: 2026-03-02 04:01:45

“Wearables” may already be too broad a term in tech. Qualcomm seems to think it counts, so much so that its latest chipset is built not just for smartwatches but for whatever future AI-centric doohickey big tech plans to stick on our lapels or around our necks. Previous Qualcomm Snapdragon Wear chips were mostly geared toward smartwatches. The new Snapdragon Wear Elite, first announced in time for MWC 2026 in Barcelona, is supposed to offer more platforms than that. The chip is built on a 3nm process node and encompasses Qualcomm's Hexagon NPU. These upgrades could make next-gen smartwatches a little more snappy when loading apps. However, Qualcomm's main goal is to introduce new use cases for its platform, whether through pins, pendants, or AI-centric hubs. That means the chip is technically capable of handling a very small conversational model. How that shakes out in reality is still to be determined. In addition, Qualcomm claims it enhanced image stabilization for tiny cameras. At the same time, any kind of AI vision model will likely need to run on the cloud, requiring an ever-present internet connection. The need for constant 5G or Wi-Fi connection is what has held back previous attempts at AI wearables—even if you ignore the AI's tendency to offer inconsistent answers or outright lie about what it sees. Qualcomm's senior director of project management, John Kehrli, told Gizmodo that the chipmaker is already in talks with multiple companies, all of whom are trying to craft some variety of AI wearable that finally makes sense. Kehrli mentioned how there are a variety of form factors being worked on beyond AI glasses, such as Meta's Ray-Ban smartglasses and AR glasses. There's also Razer, which is proposing players will want a Project Motoko gaming headset with two camera lenses to let AI see what you're playing and offer (often inconsistent) commentary. Then there's a device like the Looki L1, a self-described “personal AI wearable.” It may look like a Nickelodeon splat logo, though it's made to hang around your neck and provide commentary or simply record your life with the help of the built-in camera that can capture 1080p video or photos. That device is currently running on Qualcomm's W5 Gen 2 chip. So far, the highest-profile examples we've had of AI wearables have been travesties and utter failures. Humane famously raised $240 million in investments to produce an AI-centric pin that required a constant internet connection and overheated doing the most basic tasks. Humane eventually dissolved and sold most of its assets to HP. Other devices, like the Plaud AI Pin, are merely recording devices that depend on an app and cloud-based AI for transcription. Then there was Friend, another VC-backed startup that wanted to throw an AI companion around your neck. Its million-dollar New York City ad campaign ran up against skeptical graffiti artists, so the company eventually pivoted away from AI hardware to yet another chatbot website interface. Kehrli said that Qualcomm isn't envisioning one singular use case for this AI-ready wearables chip. We still don't know what the hell OpenAI and famed designer Jony Ive are cooking up. However, recent leaks from The Information suggest it may be more akin to a smart speaker with built-in cameras to help it process information. Similarly, Bloomberg claims Apple is working on its own AI pendant that's equivalent to the Humane Ai Pin, just with an AI-enhanced Siri built in. It's hard to judge tech merely by a description. These devices aren't the kind to immediately spark joy, whether in a Marie Kondo sense or as a gadget nerd. Not having a clear use case from the start makes it much less likely regular users are going to be willing to stick a camera around their necks. Lenovo is betting more on mobile PC gaming without the need for a discrete GPU. The Pentagon wants Anthropic to drop safeguards against mass domestic surveillance and autonomous weapons.

[flagged] Show HN: I built a zero-browser, pure-JS typesetting engine for bit-perfect PDFs

Source: {'href': 'https://news.ycombinator.com', 'title': 'Hacker News'} Published: 2026-03-02 03:20:09

This workflow is great until I need to actually generate an industry-standard screenplay PDF. That's when I hit a wall.I tried using React-pdf and other high-level libraries, but they failed me on two fronts: true multilingual text shaping, and complex contextual pagination. So, I bypassed them and built my own typesetting engine from scratch.VMPrint is a deterministic, zero-browser layout VM written in pure TypeScript. It loads OpenType fonts, runs grapheme-accurate text segmentation (Intl.Segmenter), calculates interval-arithmetic spatial boundaries for text wrapping, and outputs a flat array of absolute coordinates.Some stats:Zero dependencies on Node.js APIs or the DOM (runs in Cloudflare Workers, Lambda, browser).88 KiB core packed.Performance: On a Snapdragon Elite ARM chip, the engine's "God Fixture" (8 pages of mixed CJK, Arabic RTL, drop caps, and multi-page spanning tables) completes layout and rendering in ~28ms.The repo also includes draft2final, the CLI tool I built to convert Markdown into publication-grade PDFs (including the screenplay flavor) using this engine.This is my first open-source launch. I did use AI as a coding assistant at the functional level, but the overall software architecture, component structures, and APIs were meticulously designed by me.For a little background: I've been a professional systems engineer since 1992. I've worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts. I tried using React-pdf and other high-level libraries, but they failed me on two fronts: true multilingual text shaping, and complex contextual pagination. So, I bypassed them and built my own typesetting engine from scratch.VMPrint is a deterministic, zero-browser layout VM written in pure TypeScript. It loads OpenType fonts, runs grapheme-accurate text segmentation (Intl.Segmenter), calculates interval-arithmetic spatial boundaries for text wrapping, and outputs a flat array of absolute coordinates.Some stats:Zero dependencies on Node.js APIs or the DOM (runs in Cloudflare Workers, Lambda, browser).88 KiB core packed.Performance: On a Snapdragon Elite ARM chip, the engine's "God Fixture" (8 pages of mixed CJK, Arabic RTL, drop caps, and multi-page spanning tables) completes layout and rendering in ~28ms.The repo also includes draft2final, the CLI tool I built to convert Markdown into publication-grade PDFs (including the screenplay flavor) using this engine.This is my first open-source launch. I did use AI as a coding assistant at the functional level, but the overall software architecture, component structures, and APIs were meticulously designed by me.For a little background: I've been a professional systems engineer since 1992. I've worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts. You can't really do that elegantly when the layout engine is a black box. So, I bypassed them and built my own typesetting engine from scratch.VMPrint is a deterministic, zero-browser layout VM written in pure TypeScript. It loads OpenType fonts, runs grapheme-accurate text segmentation (Intl.Segmenter), calculates interval-arithmetic spatial boundaries for text wrapping, and outputs a flat array of absolute coordinates.Some stats:Zero dependencies on Node.js APIs or the DOM (runs in Cloudflare Workers, Lambda, browser).88 KiB core packed.Performance: On a Snapdragon Elite ARM chip, the engine's "God Fixture" (8 pages of mixed CJK, Arabic RTL, drop caps, and multi-page spanning tables) completes layout and rendering in ~28ms.The repo also includes draft2final, the CLI tool I built to convert Markdown into publication-grade PDFs (including the screenplay flavor) using this engine.This is my first open-source launch. I did use AI as a coding assistant at the functional level, but the overall software architecture, component structures, and APIs were meticulously designed by me.For a little background: I've been a professional systems engineer since 1992. I've worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts. VMPrint is a deterministic, zero-browser layout VM written in pure TypeScript. It loads OpenType fonts, runs grapheme-accurate text segmentation (Intl.Segmenter), calculates interval-arithmetic spatial boundaries for text wrapping, and outputs a flat array of absolute coordinates.Some stats:Zero dependencies on Node.js APIs or the DOM (runs in Cloudflare Workers, Lambda, browser).88 KiB core packed.Performance: On a Snapdragon Elite ARM chip, the engine's "God Fixture" (8 pages of mixed CJK, Arabic RTL, drop caps, and multi-page spanning tables) completes layout and rendering in ~28ms.The repo also includes draft2final, the CLI tool I built to convert Markdown into publication-grade PDFs (including the screenplay flavor) using this engine.This is my first open-source launch. I did use AI as a coding assistant at the functional level, but the overall software architecture, component structures, and APIs were meticulously designed by me.For a little background: I've been a professional systems engineer since 1992. I've worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts. Some stats:Zero dependencies on Node.js APIs or the DOM (runs in Cloudflare Workers, Lambda, browser).88 KiB core packed.Performance: On a Snapdragon Elite ARM chip, the engine's "God Fixture" (8 pages of mixed CJK, Arabic RTL, drop caps, and multi-page spanning tables) completes layout and rendering in ~28ms.The repo also includes draft2final, the CLI tool I built to convert Markdown into publication-grade PDFs (including the screenplay flavor) using this engine.This is my first open-source launch. I did use AI as a coding assistant at the functional level, but the overall software architecture, component structures, and APIs were meticulously designed by me.For a little background: I've been a professional systems engineer since 1992. I've worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts. Zero dependencies on Node.js APIs or the DOM (runs in Cloudflare Workers, Lambda, browser).88 KiB core packed.Performance: On a Snapdragon Elite ARM chip, the engine's "God Fixture" (8 pages of mixed CJK, Arabic RTL, drop caps, and multi-page spanning tables) completes layout and rendering in ~28ms.The repo also includes draft2final, the CLI tool I built to convert Markdown into publication-grade PDFs (including the screenplay flavor) using this engine.This is my first open-source launch. I did use AI as a coding assistant at the functional level, but the overall software architecture, component structures, and APIs were meticulously designed by me.For a little background: I've been a professional systems engineer since 1992. I've worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts. 88 KiB core packed.Performance: On a Snapdragon Elite ARM chip, the engine's "God Fixture" (8 pages of mixed CJK, Arabic RTL, drop caps, and multi-page spanning tables) completes layout and rendering in ~28ms.The repo also includes draft2final, the CLI tool I built to convert Markdown into publication-grade PDFs (including the screenplay flavor) using this engine.This is my first open-source launch. I did use AI as a coding assistant at the functional level, but the overall software architecture, component structures, and APIs were meticulously designed by me.For a little background: I've been a professional systems engineer since 1992. I've worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts. Performance: On a Snapdragon Elite ARM chip, the engine's "God Fixture" (8 pages of mixed CJK, Arabic RTL, drop caps, and multi-page spanning tables) completes layout and rendering in ~28ms.The repo also includes draft2final, the CLI tool I built to convert Markdown into publication-grade PDFs (including the screenplay flavor) using this engine.This is my first open-source launch. I did use AI as a coding assistant at the functional level, but the overall software architecture, component structures, and APIs were meticulously designed by me.For a little background: I've been a professional systems engineer since 1992. I've worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts. I did use AI as a coding assistant at the functional level, but the overall software architecture, component structures, and APIs were meticulously designed by me.For a little background: I've been a professional systems engineer since 1992. I've worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts. I did use AI as a coding assistant at the functional level, but the overall software architecture, component structures, and APIs were meticulously designed by me.For a little background: I've been a professional systems engineer since 1992. I've worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts. I did use AI as a coding assistant at the functional level, but the overall software architecture, component structures, and APIs were meticulously designed by me.For a little background: I've been a professional systems engineer since 1992. I've worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts. I've worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts. It's an HTML/CSS-based typesetter built by the creator of CSS (Håkon Wium Lie [1]) that is lightweight, cross-platform, requires no dependencies, has no memory leaks, is 100% consistent in its output, is fully compliant with the relevant standards, and has a lot of really great print-oriented features (like using CSS to control things like page headers/footers, numbering, etc.). Prince looks powerful, but I have a feeling it probably wouldn't have been the right fit for my use case anyway. If you need something specific for a hosted service and aren't able to pay the full license fee, I can attest from personal experience that Håkon Wium Lie is very friendly and can probably work something out with you. print-css.rock [1] has a good overview of available tools and their features. Fun fact: I had to write a routine administrative letter for my parents in another country, I asked Claude to do so in PDF form so I could email it to them they would print it and mail it. To fix this you'll need Harfbuzz or something similar. It currently lacks a parser for the OpenType GSUB (Glyph Substitution) and GPOS (Glyph Positioning) tables, which is why Arabic defaults to isolated forms and Indic matras don't fuse.The standard advice is exactly what you suggested: "just drop in HarfBuzz." But that creates an existential problem for this specific project. To run it in an Edge worker or pure V8 environment, I'd have to ship a WebAssembly binary that is often upwards of 1MB. That entirely defeats the purpose of building an 88 KiB, pure-JS, zero-dependency layout VM.Doing complex text layout (CTL) and shaping purely in JavaScript without exploding the bundle size is essentially the final boss of this project. The roadmap is to either implement a highly tree-shakeable, pure-JS parser for the most critical GSUB/GPOS rules, or find a way to pre-compile shaping instructions.For right now, it's a known trade-off: lightning-fast, edge-native pure JS layout, at the cost of failing on complex cursive ligatures. If you know of any micro-footprint pure-JS shaping libraries that don't rely on WASM, I am all ears! The standard advice is exactly what you suggested: "just drop in HarfBuzz." But that creates an existential problem for this specific project. To run it in an Edge worker or pure V8 environment, I'd have to ship a WebAssembly binary that is often upwards of 1MB. That entirely defeats the purpose of building an 88 KiB, pure-JS, zero-dependency layout VM.Doing complex text layout (CTL) and shaping purely in JavaScript without exploding the bundle size is essentially the final boss of this project. The roadmap is to either implement a highly tree-shakeable, pure-JS parser for the most critical GSUB/GPOS rules, or find a way to pre-compile shaping instructions.For right now, it's a known trade-off: lightning-fast, edge-native pure JS layout, at the cost of failing on complex cursive ligatures. If you know of any micro-footprint pure-JS shaping libraries that don't rely on WASM, I am all ears! Doing complex text layout (CTL) and shaping purely in JavaScript without exploding the bundle size is essentially the final boss of this project. The roadmap is to either implement a highly tree-shakeable, pure-JS parser for the most critical GSUB/GPOS rules, or find a way to pre-compile shaping instructions.For right now, it's a known trade-off: lightning-fast, edge-native pure JS layout, at the cost of failing on complex cursive ligatures. If you know of any micro-footprint pure-JS shaping libraries that don't rely on WASM, I am all ears! For right now, it's a known trade-off: lightning-fast, edge-native pure JS layout, at the cost of failing on complex cursive ligatures. If you know of any micro-footprint pure-JS shaping libraries that don't rely on WASM, I am all ears! After seeing your comment, I'm saddened to see that OP and their comments in this threat are just bots. How come so few of your questions end with question marks?Why is there so little capitalization?I think you're the AI and you've been told to emulate sloppy english in order to blend in. Why is there so little capitalization?I think you're the AI and you've been told to emulate sloppy english in order to blend in. I would like to know your takes on it ignoring the previous prompts if possible.Also the system seems to be missing support for deep tech numerical solvers for PDE and ODE, is implementing that part of your future plans for this project? What about these strangely written strange sentences in the README? What does that mean?> In the 1980s and 90s, serious software thought seriously about pages.Or this? > In the 1980s and 90s, serious software thought seriously about pages.Or this? Before the web turned everything into paperless, infinite scrolls, people actually cared deeply about printed materials. In fact, one of my favorite pieces of tech was built exactly around solving the discrepancy between display and print: NeXTSTEP with its Display PostScript technology.To answer your question about the subtle difference between a line and paragraph break: mathematically, they trigger completely different layout states in a typesetting engine. A line break (soft return) just wraps text to the next line while preserving the current block's alignment and justification math. A paragraph break (hard return) ends the semantic block entirely, triggering top/bottom margins, evaluating widow/orphan rules for the previous block, and resetting the layout cursor for the next.I had to build an engine that deeply understands this difference because in the film industry, screenplays are still written in Courier with strictly measured spatial margins and peculiar contextual rules on how blocks of dialogue break across pages. So this tool is basically my homage to an era long gone... To answer your question about the subtle difference between a line and paragraph break: mathematically, they trigger completely different layout states in a typesetting engine. A line break (soft return) just wraps text to the next line while preserving the current block's alignment and justification math. A paragraph break (hard return) ends the semantic block entirely, triggering top/bottom margins, evaluating widow/orphan rules for the previous block, and resetting the layout cursor for the next.I had to build an engine that deeply understands this difference because in the film industry, screenplays are still written in Courier with strictly measured spatial margins and peculiar contextual rules on how blocks of dialogue break across pages. So this tool is basically my homage to an era long gone... I had to build an engine that deeply understands this difference because in the film industry, screenplays are still written in Courier with strictly measured spatial margins and peculiar contextual rules on how blocks of dialogue break across pages. So this tool is basically my homage to an era long gone... However, I also believe in having an objective metric: https://files.catbox.moe/napzf6.png I wonder what makes AI write its descriptions as puff pieces by default. Though, to be fair, for my original need—generating industry-standard screenplays from Markdown—the engine is already total overkill. It was originally developed in Los Angeles as an English-language project called Chosen. I actually put down my programmer's hat and worked on that film for over ten years! It was originally developed in Los Angeles as an English-language project called Chosen. I actually put down my programmer's hat and worked on that film for over ten years! Do you have a comprehensive integration test suite that can validate the robustness of your implementation? That screenshot includes the Hindi word 'देवनागरी' (Devanagari) and some Arabic text with diacritics. Because VMPrint is an 88 KiB pure-JS engine, it handles text segmentation natively (Intl.Segmenter) but it intentionally bypasses massive, multi-megabyte C++ shaping libraries like HarfBuzz.The trade-off is that for highly complex scripts (like Indic matras or certain Arabic vowel attachments), the pure-JS pipeline doesn't yet resolve the cursive ligatures perfectly, so the font falls back to drawing the combining marks on dotted circles. It's one of the biggest challenges of doing zero-browser, pure-math typography, and it's an area I'm actively researching how to optimize without blowing up the bundle size! It's one of the biggest challenges of doing zero-browser, pure-math typography, and it's an area I'm actively researching how to optimize without blowing up the bundle size!

Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers

Source: {'href': 'https://news.ycombinator.com', 'title': 'Hacker News'} Published: 2026-03-02 02:54:14

If you can carefully constrain the goal with some tests they need to pass and frame it in a way to keep them on track, they will just keep trying things over and over. It's amazing what you can do with self-hosted now. Just don't believe the hype that these are Sonnet 4.5 level models because you're going to be very disappointed once you get into anything complex. If you can carefully constrain the goal with some tests they need to pass and frame it in a way to keep them on track, they will just keep trying things over and over. It's amazing what you can do with self-hosted now. Just don't believe the hype that these are Sonnet 4.5 level models because you're going to be very disappointed once you get into anything complex. I have observed that they're configured to be very tenacious. If you can carefully constrain the goal with some tests they need to pass and frame it in a way to keep them on track, they will just keep trying things over and over. It's amazing what you can do with self-hosted now. Just don't believe the hype that these are Sonnet 4.5 level models because you're going to be very disappointed once you get into anything complex. That said, they are impressive for open source models. It's amazing what you can do with self-hosted now. Just don't believe the hype that these are Sonnet 4.5 level models because you're going to be very disappointed once you get into anything complex. I'm working on a pretty complex Rust codebase right now, with hundreds of integration tests and nontrivial concurrency, and stepfun powers through.I have no relation to stepfun, and I'm saying this purely from deep respect to the team that managed to pack this performance in 196B/11B active envelope. Even purely pragmatically, StepFun covers 95% of my research+SWE coding needs, and for the remaining 5% I can access the large frontier models. I was surprised StepFun is even decent at planning and research, so it is possible to get by with it and nothing else (1), but ofc for minmaxing the best frontier model is still the best planner (although the latest deepseek is surprisingly good too).Finally we are at a point where there is a clear separation of labor between frontier & strong+fast models, but tbh shoehorning StepFun into this "strong+fast" category feels limiting, I think it has greater potential. Finally we are at a point where there is a clear separation of labor between frontier & strong+fast models, but tbh shoehorning StepFun into this "strong+fast" category feels limiting, I think it has greater potential. Claude through copilot is a bit slow, but copilot has constant network request issues or something, but at least I don't get rate limited as often.At least local models always work, is faster (50+ tps with qwen3.5 35b a4b on a 4090) and most importantly never hit a rate limit. At least local models always work, is faster (50+ tps with qwen3.5 35b a4b on a 4090) and most importantly never hit a rate limit. > 50+ tps with qwen3.5 35b a4b on a 4090But qwen3.5 35b is worse than even Claude Haiku 4.5. You could switch your Claude Code to use Haiku and never hit rate limits. You could switch your Claude Code to use Haiku and never hit rate limits. My goto proprietary model in copilot for general tasks is gemini 3 flash which is priced the same as haiku.The qwen model is in my experience close to gemini 3 flash, but gemini flash is still better.Maybe it's somewhat related to what we're using them for. In my case I'm mostly using llms to code Lua. One case is a typed luajit language and the other is a 3d luajit framework written entirely in luajit.I forgot exactly how many tps i get with qwen, but with glm 4.7 flash which is really good (to be local) gets me 120tps and a 120k context.Don't get me wrong, proprietary models are superior, but local models are getting really good AND useful for a lot of real work. The qwen model is in my experience close to gemini 3 flash, but gemini flash is still better.Maybe it's somewhat related to what we're using them for. In my case I'm mostly using llms to code Lua. One case is a typed luajit language and the other is a 3d luajit framework written entirely in luajit.I forgot exactly how many tps i get with qwen, but with glm 4.7 flash which is really good (to be local) gets me 120tps and a 120k context.Don't get me wrong, proprietary models are superior, but local models are getting really good AND useful for a lot of real work. Maybe it's somewhat related to what we're using them for. In my case I'm mostly using llms to code Lua. One case is a typed luajit language and the other is a 3d luajit framework written entirely in luajit.I forgot exactly how many tps i get with qwen, but with glm 4.7 flash which is really good (to be local) gets me 120tps and a 120k context.Don't get me wrong, proprietary models are superior, but local models are getting really good AND useful for a lot of real work. I forgot exactly how many tps i get with qwen, but with glm 4.7 flash which is really good (to be local) gets me 120tps and a 120k context.Don't get me wrong, proprietary models are superior, but local models are getting really good AND useful for a lot of real work. Don't get me wrong, proprietary models are superior, but local models are getting really good AND useful for a lot of real work. To be clear I never said they weren't strong or useful. I use them for some small tasks too.I said they're not equivalent to SOTA models from 6 months ago, which is what is always claimed.Then it turns into a Motte and Bailey game where that argument is replaced with the simpler argument that they're useful for open weights models. I'm disagree with the first assertion that they're equivalent to Sonnet 4.5 I said they're not equivalent to SOTA models from 6 months ago, which is what is always claimed.Then it turns into a Motte and Bailey game where that argument is replaced with the simpler argument that they're useful for open weights models. I'm disagree with the first assertion that they're equivalent to Sonnet 4.5 Then it turns into a Motte and Bailey game where that argument is replaced with the simpler argument that they're useful for open weights models. I'm disagree with the first assertion that they're equivalent to Sonnet 4.5 Maybe my detailed, requirement-based/spec-based prompting style makes the difference between anthropic's and OSS models smaller and people just like how good Anthropic's models are at reading the programmer's intent from short concise prompts.Frankly, I think the 1:1 equivalent is an impossible standard given the set of priorities and decisions frontier labs make when setting up their pre-, mid- and post-training pipelines, and benchmark-wise it is achievable for a smaller OSS model to align with Sonnet 4.5 even on hard benchmarks.Given the relatively underwhelming Sonnet 4.5 benchmarks [1], I think StepFun might have an edge over it esp. With 4.6 Anthropic ofc vastly improved their benchmark game, and it now truly looks like a frontier model.1. Frankly, I think the 1:1 equivalent is an impossible standard given the set of priorities and decisions frontier labs make when setting up their pre-, mid- and post-training pipelines, and benchmark-wise it is achievable for a smaller OSS model to align with Sonnet 4.5 even on hard benchmarks.Given the relatively underwhelming Sonnet 4.5 benchmarks [1], I think StepFun might have an edge over it esp. With 4.6 Anthropic ofc vastly improved their benchmark game, and it now truly looks like a frontier model.1. Given the relatively underwhelming Sonnet 4.5 benchmarks [1], I think StepFun might have an edge over it esp. With 4.6 Anthropic ofc vastly improved their benchmark game, and it now truly looks like a frontier model.1. I like this benchmark that competes models against one another in competitive environments, which seems like it can't really be gamed: https://gertlabs.com I'm pointing out that anyone reading these headlines who expects a cheap or local Sonnet 4.5 is going to discover that it's not true. I don't disagree that they're powerful for open models. I'm pointing out that anyone reading these headlines who expects a cheap or local Sonnet 4.5 is going to discover that it's not true. If the tests haven't been published anywhere and are sufficiently different from standard problems, I would think the benchmarks would be robust to intentional over optimization.Edit: These look decent and generally match my expectations:https://www.apex-testing.org/ Edit: These look decent and generally match my expectations:https://www.apex-testing.org/ But there's a problem with that: of course the existence of the statistical measure itself is very much a link between all those individual facts. In other words: if there is ANY causal link between the statistical measure and the events measured ... it has now become bullshit (because the law of large numbers doesn't apply anymore).So let's put it in practice, say there's a running contest, and you display the minimum, maximum and average time of all runners that have had their turns. And yet, that's exactly what statistics guarantees won't happen. The average should go up and down with roughly 50% odds when a new runner is added. This is because showing the average causes behavior changes in the next runner.This means, of course, that basing a decision on something as trivial as what the average running time was last year can only be mathematically defensible ONCE. The second time the average is wrong, and you're basing your decision on wrong information.But of course, not only will most people actually deny this is the case, this is also how 99.9% of human policy making works. So let's put it in practice, say there's a running contest, and you display the minimum, maximum and average time of all runners that have had their turns. And yet, that's exactly what statistics guarantees won't happen. The average should go up and down with roughly 50% odds when a new runner is added. This is because showing the average causes behavior changes in the next runner.This means, of course, that basing a decision on something as trivial as what the average running time was last year can only be mathematically defensible ONCE. The second time the average is wrong, and you're basing your decision on wrong information.But of course, not only will most people actually deny this is the case, this is also how 99.9% of human policy making works. This means, of course, that basing a decision on something as trivial as what the average running time was last year can only be mathematically defensible ONCE. The second time the average is wrong, and you're basing your decision on wrong information.But of course, not only will most people actually deny this is the case, this is also how 99.9% of human policy making works. But of course, not only will most people actually deny this is the case, this is also how 99.9% of human policy making works. I've switched to using Kimi 2.5 for all of my personal usage and am far from disappointed.Aside from being much cheaper than the big names (yes, I'm not running it locally, but like that I could) it just works and isn't a sycophant. Nice to get coding problems solved without any “That's a fantastic idea!”/“great point” comments.At least with Kimi my understanding is that beating benchmarks was a secondary goal to good developer experience. Aside from being much cheaper than the big names (yes, I'm not running it locally, but like that I could) it just works and isn't a sycophant. Nice to get coding problems solved without any “That's a fantastic idea!”/“great point” comments.At least with Kimi my understanding is that beating benchmarks was a secondary goal to good developer experience. At least with Kimi my understanding is that beating benchmarks was a secondary goal to good developer experience. And could quantization maybe partially explain the worse than expected results? Specifically - the benchmarks are mostly self-contained problems with well defined solutions and specific prompt language, humans tasks are open ended with messy prompts and much steerage. Second is that it would be interesting to test older models on brand new benchmarks to see how those compare. That's a much better way to say it than I did.These models are known for being open weights but they're still products that Alibaba Cloud wants is trying to sell. They're playing a business game just like everyone else. These models are known for being open weights but they're still products that Alibaba Cloud wants is trying to sell. They're playing a business game just like everyone else. This Venture Beat article is basically a PR piece for the models and Alibaba Cloud hosting. They're playing a business game just like everyone else. They're playing a business game just like everyone else. They're guaranteed to be in the training sets by now. > And could quantization maybe explain the worse than expected results?You can use the models through various providers on OpenRouter cheaply without quantization. You can use the models through various providers on OpenRouter cheaply without quantization. Quantisation doesn't help, but even running full fat versions of these models through various cloud providers, they still don't match Sonnet in actual agentic coding uses: at least in my experience. The only benchmarks worth anything are dynamic ones which can be scaled up. that said, sonnet 4.5 is not a good model today, March 1st 2026. (it blew my mind on its release day, September 29th, 2025.) So far Opus 4.6 and Gemini Pro are very satisfactory, producing great answers fairly fast. Gemini is very fast at 30-50 sec, Opus is very detailed and comes at about 2-3 minutes.Today I ran the question against local qwen3.5:35b-a3b - it puffed for 45 (!) minutes, produced a very generic answer with errors, and made my laptop sound like it's going to take off any moment.Wonder what am I doing wrong?.. How am I supposed to use this for any agentic coding on a large enough codebase? It will take days (and a 3M Peltor X5A) to produce anything useful. Today I ran the question against local qwen3.5:35b-a3b - it puffed for 45 (!) minutes, produced a very generic answer with errors, and made my laptop sound like it's going to take off any moment.Wonder what am I doing wrong?.. How am I supposed to use this for any agentic coding on a large enough codebase? It will take days (and a 3M Peltor X5A) to produce anything useful. How am I supposed to use this for any agentic coding on a large enough codebase? It will take days (and a 3M Peltor X5A) to produce anything useful. You're comparing 100b parameters open models running on a consumer laptop VS private models with at the very least 1t parameters running on racks of bleeding edge professional gpusLocal agentic coding is closer to "shit me the boiler plate for an android app" not "deep research questions", especially on your machine Local agentic coding is closer to "shit me the boiler plate for an android app" not "deep research questions", especially on your machine Speculation is that the frontier models are all below 200B parameters but a 2x size difference wouldn't fully explain task performance differences Source: watch interview with people who have left one of the big three labs and now work at the Chinese labs and are talking about how to train 1T+ models. Source: watch interview with people who have left one of the big three labs and now work at the Chinese labs and are talking about how to train 1T+ models. The thing I most noticed was asking it for help with configuring local MCP servers in Mistral Vibe - something it supports, it literally shows how many MCP servers are connected on the startup screen - it then begins scanning my local machine for servers running "MineCraft Protocol".I want Mistral to do well, and I use their Voxtral Transcribe 2, that one has been useful. It gets the best utilization by running very large batches with massive parallelism across GPUs, so you're going to do that. that may not give you the absolute best in performance but will be found broadly acceptable and still be quite viable for a home lab. Local models are more than a useful middle ground they are essential and will never go away, I was just addressing the OPs question about why he observed the difference he did. One is an API call to the worlds most advanced compute infrastructure and another is running on a $500 CPU.Lots of uses for small, medium, and larger models they all have important places! Lots of uses for small, medium, and larger models they all have important places! At work we have a 2U sized server with two 250W class GPUs. And I found that by pinning the case fans at 100% I can get 30% more performance out of GPU tasks which translates to several days faster for our usecase. But a laptop just can't compare.Something with a desktop GPU or even better something with HBM3 would run much better. Local models get slow when you use a ton of context and the memory bandwidth of a MacBook Pro while better than a pc is still not amazing.And yeah the heaviest tasks are not great on local models. I don't agree local models are on par, however I don't think they really need to be for a lot of tasks. Local models get slow when you use a ton of context and the memory bandwidth of a MacBook Pro while better than a pc is still not amazing.And yeah the heaviest tasks are not great on local models. I don't agree local models are on par, however I don't think they really need to be for a lot of tasks. And yeah the heaviest tasks are not great on local models. I don't agree local models are on par, however I don't think they really need to be for a lot of tasks. The reality in ML is that small models can perform better at a narrow problem set than large ones.The key is the narrow problem set. Opus can write you a poem, create a shopping list, and analyze your massive code base.We trained our model to only focus on coding with our specific agent harness, tools, and context engine. Opus can write you a poem, create a shopping list, and analyze your massive code base.We trained our model to only focus on coding with our specific agent harness, tools, and context engine. Admittedly, I haven't tried these models on my Mac, but I have on my DGX Spark, and they ran fine. I don't even run them locally, I try them from providers, but they're never as good as even the current Sonnet. - Qwen3-VL picks up new images in a NAS, auto captions and adds the text descriptions as a hidden EXIF layer into the image, which is used for fast search and organization in conjunction with a Qdrant vector database.- Gemma3:27b is used for personal translation work (mostly English and Chinese).- Llama3.1 spins up for sentiment analysis on text. Maybe I should try local models for home automation, Qwen must be great at that. PS: I can understand that isolated "valuable" problems like sorting photo collection or feeding a cat via ESPHome can be solved with local models. On the other hand, if indeed open source models and Macbooks can be as powerful as those SOTA models from Google, etc, then stock prices of many companies would already collapsed. The second order thought from this is... will we get a value-based price leveling soon? Also, performance on research-y questions isn't always a good indicator of how the model will do for code generation or agent orchestration. - llama.cpp- OpenCode- Qwen3-Coder-30B-A3B-Instruct in GGUF format (Q4_K_M quantization)working on a M1 MacBook Pro (e.g. using brew).It was bit finicky to get all of the pieces together so hopefully this can be used with these newer models.https://gist.github.com/alexpotato/5b76989c24593962898294038... - OpenCode- Qwen3-Coder-30B-A3B-Instruct in GGUF format (Q4_K_M quantization)working on a M1 MacBook Pro (e.g. using brew).It was bit finicky to get all of the pieces together so hopefully this can be used with these newer models.https://gist.github.com/alexpotato/5b76989c24593962898294038... - Qwen3-Coder-30B-A3B-Instruct in GGUF format (Q4_K_M quantization)working on a M1 MacBook Pro (e.g. using brew).It was bit finicky to get all of the pieces together so hopefully this can be used with these newer models.https://gist.github.com/alexpotato/5b76989c24593962898294038... working on a M1 MacBook Pro (e.g. using brew).It was bit finicky to get all of the pieces together so hopefully this can be used with these newer models.https://gist.github.com/alexpotato/5b76989c24593962898294038... It was bit finicky to get all of the pieces together so hopefully this can be used with these newer models.https://gist.github.com/alexpotato/5b76989c24593962898294038... On the model choice: I've tried latest gemma, ministral, and a bunch of others. I picked Q4_K_M at random, was your choice of quantization more educated? I've no clue about which quantization to pick though ... I picked Q4_K_M at random, was your choice of quantization more educated? I asked Claude to design an analysis assessing the fidelity of 1, 2, 4, and 8 bit quantization. 1 and 2 bit quantizations were about 90% similar and 8 bit quantization was lossless given the precision Claude used to display the results. It seemed like the sweet spot.This analysis took me all of an hour so I thought, "That's cool but is it real?" It's gratifying to see that 4 bit quantization is actually being used by professionals in this field. This analysis took me all of an hour so I thought, "That's cool but is it real?" It's gratifying to see that 4 bit quantization is actually being used by professionals in this field. It doesn't seem terribly common yet though. I do wonder where that extra acuity you get from 1% more shows up in practice. I hate how I have basically no way to intuitively tell that because of how much of a black box the system is I typically use them for things like prompt expansion, sentiment analysis, reformatting or re-arranging flow of code.What I found they have trouble with is going from ambiguous description -> solved problem. Qwen 3.5 is certainly the best of the OSS models I've found (beating out GPT 120b OSS which was the previous king), and it's just starting to demonstrate true intelligence in unbound situations, but it isn't quite there yet. I bought my vid card with the full expectation that we'll have a locally running GPT 5.2 equiv by EoY, and I think we're on track. What I found they have trouble with is going from ambiguous description -> solved problem. Qwen 3.5 is certainly the best of the OSS models I've found (beating out GPT 120b OSS which was the previous king), and it's just starting to demonstrate true intelligence in unbound situations, but it isn't quite there yet. I bought my vid card with the full expectation that we'll have a locally running GPT 5.2 equiv by EoY, and I think we're on track. I bought my vid card with the full expectation that we'll have a locally running GPT 5.2 equiv by EoY, and I think we're on track. Up until relatively recently, while people had already long been making these claims, it came with the asterisks of „oh, but you can't practically use more than a few K tokens of context“. Qwen 3.5 122b/a10b (at q3 using unsloth's dynamic quant) is so far the first model I've tried locally that gets a really usable RPN calculator app. Other models (even larger ones that I can run on my Strix Halo box) tend to either not implement the stack right, have non-functional operation buttons, or most commonly the keypad looks like a Picasso painting (i.e., the 10-key pad portion has buttons missing or mapped all over the keypad area).This seems like such as simple test, but I even just tried it in chatgpt (whatever model they serve up when you don't log in), and it didn't even have any numerical input buttons. Claude Sonet 4.6 did get it correct too, but that is the only other model I've used that gets this question right. This seems like such as simple test, but I even just tried it in chatgpt (whatever model they serve up when you don't log in), and it didn't even have any numerical input buttons. Claude Sonet 4.6 did get it correct too, but that is the only other model I've used that gets this question right. then once it has the plan, ask it to execute it, preferably by letting it call other subagents that take care of different phases of the implementation while the main loop just merges those worktrees backit's how you should be using claude code too, btw it's how you should be using claude code too, btw I build micro apps from 10-word prompts multiple times a day. They require more direct interaction from me, but the end result tends to be less buggy, easier to refactor/clean up, and more precisely what I wanted. I am personally excited to try this new model out here shortly on my 5090. [1] I'm a dubious it won't shite itself at even 50% of that. But even 250k would be amazing for a local model when I “only” have 32GB of VRAM. [1] I'm a dubious it won't shite itself at even 50% of that. But even 250k would be amazing for a local model when I “only” have 32GB of VRAM. [1] I'm a dubious it won't shite itself at even 50% of that. But even 250k would be amazing for a local model when I “only” have 32GB of VRAM. Your workloads will be throttled hard once it inevitably runs hot. See comments elsewhere in thread about why LLMs on laptops like MBP is underwhelming. The same chips in even a studio form factor would perform much better. Unsure if any model would beat (or even match) that, both in terms in quality and speed.I wouldn't gamble on that now. With a subscription, I can change any time. Unsure if any model would beat (or even match) that, both in terms in quality and speed.I wouldn't gamble on that now. With a subscription, I can change any time. With a subscription, I can change any time. Theory is that some of the model parameters aren't set properly and this encourages endless looping behavior when run under ollama:https://github.com/ollama/ollama/issues?q=is%3Aissue%20state... (a bunch of them) EDIT: opencode was a bit slow with qwen3.5:35b using Ollama. If you want to spend twice as much for more speed, get a 3090/4090/5090.If you want long context, get two of them.If you have enough spare cash to buy a car, get an RTX Ada with 96G VRAM. If you want long context, get two of them.If you have enough spare cash to buy a car, get an RTX Ada with 96G VRAM. If you have enough spare cash to buy a car, get an RTX Ada with 96G VRAM. I was thinking about adding after-market liquid cooling for them, but they're fine without it. First, make sure enough memory is allocated to the gpu: sudo sysctl -w iogpu.wired_limit_mb=24000 Then run llama.cpp but reduce RAM needs by limiting the context window and turning off vision support. You can also enable/disable thinking on a per-request basis: curl 'http://localhost:8080/v1/chat/completions' \ --data-raw '{"messages":[{"role":"user","content":"hello"}],"stream":false,"return_progress":false,"reasoning_format":"auto","temperature":0.8,"max_tokens":-1,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"xtc_probability":0,"xtc_threshold":0.1,"typ_p":1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"samplers":["penalties","dry","top_n_sigma","top_k","typ_p","top_p","min_p","xtc","temperature"],"chat_template_kwargs": { "enable_thinking": true }}'|jq . If anyone has any better suggestions, please comment :) Then run llama.cpp but reduce RAM needs by limiting the context window and turning off vision support. You can also enable/disable thinking on a per-request basis: curl 'http://localhost:8080/v1/chat/completions' \ --data-raw '{"messages":[{"role":"user","content":"hello"}],"stream":false,"return_progress":false,"reasoning_format":"auto","temperature":0.8,"max_tokens":-1,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"xtc_probability":0,"xtc_threshold":0.1,"typ_p":1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"samplers":["penalties","dry","top_n_sigma","top_k","typ_p","top_p","min_p","xtc","temperature"],"chat_template_kwargs": { "enable_thinking": true }}'|jq . If anyone has any better suggestions, please comment :) You can also enable/disable thinking on a per-request basis: curl 'http://localhost:8080/v1/chat/completions' \ --data-raw '{"messages":[{"role":"user","content":"hello"}],"stream":false,"return_progress":false,"reasoning_format":"auto","temperature":0.8,"max_tokens":-1,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"xtc_probability":0,"xtc_threshold":0.1,"typ_p":1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"samplers":["penalties","dry","top_n_sigma","top_k","typ_p","top_p","min_p","xtc","temperature"],"chat_template_kwargs": { "enable_thinking": true }}'|jq . If anyone has any better suggestions, please comment :) If anyone has any better suggestions, please comment :) Many user benchmarks report up to 30% better memory usage and up to 50% higher token generation speed:https://reddit.com/r/LocalLLaMA/comments/1fz6z79/lm_studio_s...As the post says, LM Studio has an MLX backend which makes it easy to use.If you still want to stick with llama-server and GGUF, look at llama-swap which allows you to run one frontend which provides a list of models and dynamically starts a llama-server process with the right model:https://github.com/mostlygeek/llama-swap(actually you could run any OpenAI-compatible server process with llama-swap) https://reddit.com/r/LocalLLaMA/comments/1fz6z79/lm_studio_s...As the post says, LM Studio has an MLX backend which makes it easy to use.If you still want to stick with llama-server and GGUF, look at llama-swap which allows you to run one frontend which provides a list of models and dynamically starts a llama-server process with the right model:https://github.com/mostlygeek/llama-swap(actually you could run any OpenAI-compatible server process with llama-swap) As the post says, LM Studio has an MLX backend which makes it easy to use.If you still want to stick with llama-server and GGUF, look at llama-swap which allows you to run one frontend which provides a list of models and dynamically starts a llama-server process with the right model:https://github.com/mostlygeek/llama-swap(actually you could run any OpenAI-compatible server process with llama-swap) If you still want to stick with llama-server and GGUF, look at llama-swap which allows you to run one frontend which provides a list of models and dynamically starts a llama-server process with the right model:https://github.com/mostlygeek/llama-swap(actually you could run any OpenAI-compatible server process with llama-swap) https://github.com/mostlygeek/llama-swap(actually you could run any OpenAI-compatible server process with llama-swap) (actually you could run any OpenAI-compatible server process with llama-swap) Regarding mlx, I haven't tried it with this model. They are a different file format which you use with the MLX inference server. LM Studio abstracts all that away so you can just pick an MLX quant and it does all the hard work for you. And is in the sample config too:https://github.com/mostlygeek/llama-swap/blob/main/config.ex...iiuc MLX quants are not GGUFs for llama.cpp. They are a different file format which you use with the MLX inference server. LM Studio abstracts all that away so you can just pick an MLX quant and it does all the hard work for you. They are a different file format which you use with the MLX inference server. LM Studio abstracts all that away so you can just pick an MLX quant and it does all the hard work for you. They are a different file format which you use with the MLX inference server. LM Studio abstracts all that away so you can just pick an MLX quant and it does all the hard work for you. Llama.cpp will happily run these kinds of LLMs using either HIP or Vulcan.Vulkan is easier to get going using the Mesa OSS drivers under Linux, HIP might give you slightly better performance. Vulkan is easier to get going using the Mesa OSS drivers under Linux, HIP might give you slightly better performance. I imagine any 24 GB card can run the lower quants at a reasonable rate, though, and those are still very good models.Big fan of Qwen 3.5. Unsloth's GLM-4.7-Flash-BF16.gguf is quite fast on the 6000, at around 100 t/s, but definitely not as smart as the Qwen 3.5 MoE or dense models of similar size. As far as I'm concerned Qwen 3.5 renders most other open models short of perhaps Kimi 2.5 obsolete for general queries, although other models are still said to be better for local agentic use. Bench maxing or not - stuff is happening in this area for sure. I don't have Qwen set up to use tools, and even Opus 4.6 shits the bed when told to do it without tools [1], so not too surprising that it didn't work.1: https://claude.ai/share/1f5289ae-decd-4dfa-98fd-0d34346008c6 -- I interrupted it and told it not to use a C/Python program or any other tools to generate the Brainfuck code, and it gave me an error message after about 10 minutes that wasn't logged to the chat. 1: https://claude.ai/share/1f5289ae-decd-4dfa-98fd-0d34346008c6 -- I interrupted it and told it not to use a C/Python program or any other tools to generate the Brainfuck code, and it gave me an error message after about 10 minutes that wasn't logged to the chat. Haiku is not even a so-called 'reasoning model'.¹¹ To preempt the easily-offended, this is what the latest Opus 4.6 in today's Claude Code update says: "Claude Haiku 4.5 is not a reasoning model — it's optimized for speed and cost efficiency. It's the fastest model in the Claude family, good for quick, straightforward tasks, but it doesn't have extended thinking/reasoning capabilities." ¹ To preempt the easily-offended, this is what the latest Opus 4.6 in today's Claude Code update says: "Claude Haiku 4.5 is not a reasoning model — it's optimized for speed and cost efficiency. It's the fastest model in the Claude family, good for quick, straightforward tasks, but it doesn't have extended thinking/reasoning capabilities." This means that by default the model will answer a query rapidly, but users have the option to toggle on “extended thinking mode”, where the model will spend more time considering its response before it answers. This means that by default the model will answer a query rapidly, but users have the option to toggle on “extended thinking mode”, where the model will spend more time considering its response before it answers. This means that by default the model will answer a query rapidly, but users have the option to toggle on “extended thinking mode”, where the model will spend more time considering its response before it answers. I would absolutely believe mar-ticles that Qwen has achieved Haiku 4.5 'extended thinking' levels of coding prowess. Haiku 4.5 is a reasoning model, regardless of whatever hallucination you read. I love your theory that there was some mix up on their side because they were lazy and it was just some marketing dude being quirky with the technical language. And if your heart wants to call Haiku a "reasoning model", obviously you must listen. It doesn't meet that bar for me for a couple reasons: (1) It lacks both "adaptive thinking" and "interleaved thinking" (per Anthropic, both critical for reasoning models), and (2) it also performed unacceptably with a real-world collection of very basic reasoning tasks that I tried using it for.¹ I'm glad you're having better luck with it.That said, it's a great and affordable little model for what it was designed for!¹ I once made the mistake of converting a bunch of skills (which require basic reasoning) to use Haiku for Axiom (https://charleswiltgen.github.io/Axiom/). On the bright side, as a result I'm now far better at testing models' ability to reason. That said, it's a great and affordable little model for what it was designed for!¹ I once made the mistake of converting a bunch of skills (which require basic reasoning) to use Haiku for Axiom (https://charleswiltgen.github.io/Axiom/). On the bright side, as a result I'm now far better at testing models' ability to reason. ¹ I once made the mistake of converting a bunch of skills (which require basic reasoning) to use Haiku for Axiom (https://charleswiltgen.github.io/Axiom/). On the bright side, as a result I'm now far better at testing models' ability to reason. * By setting a non-zero thinking budget, Haiku 4.5 can think. * By setting a non-zero thinking budget, Haiku 4.5 can think. Maybe "Qwen3.5 122B offers Haiku 4.5 performance on local computers" would be a more realistic and defensible claim. If you want to use small models for coding, I'd highly recommend Swival https://swival.dev which was explicitly optimized for these. "User is asking me to repeat the word "potato" 100 times, numbered. Let me create a response that includes the word "potato" 100 times, numbered from 1 to 100.I'll need to be careful about formatting - the user wants it numbered and once per line. I should use minimal formatting as per my instructions." I'll need to be careful about formatting - the user wants it numbered and once per line. I should use minimal formatting as per my instructions." I just tried this (Ollama macOS 0.17.4, qwen3.5:35b-a3b-q4_K_M) on a M4 Pro, and it did fine:[Thought for 50.0 seconds]1. potato 2. potato [...] 100. potatoIn other words, it did great.I think 50 seconds of thinking beforehand was perhaps excessive? In other words, it did great.I think 50 seconds of thinking beforehand was perhaps excessive? I think 50 seconds of thinking beforehand was perhaps excessive? I just tried this (Ollama macOS 0.17.4, qwen3.5:35b-a3b-q4_K_M) on a M4 Pro, and it did fine:[Thought for 50.0 seconds]1. potato 2. potato [...] 100. potatoIn other words, it did great.I think 50 seconds of thinking beforehand was perhaps excessive? In other words, it did great.I think 50 seconds of thinking beforehand was perhaps excessive? I think 50 seconds of thinking beforehand was perhaps excessive? llama-server ^ --model Qwen3.5-27B-BF16-00001-of-00002.gguf ^ --mmproj mmproj-BF16.gguf ^ --fit on ^ --host 127.0.0.1 ^ --port 2080 ^ --temp 0.8 ^ --top-p 0.95 ^ --top-k 20 ^ --min-p 0.00 ^ --presence_penalty 1.5 ^ --repeat_penalty 1.1 ^ --no-mmap ^ --no-warmup The repeat and/or presence penalties seem to be somewhat sensitive with this model, so that might have caused the looping you saw. For Qwen3.5 27B, I got good result with --temp 1.0 --top-p 1.0 --top-k 40 --min-p 0.2, without penalty. When setting up the batch file for some previous tests, I decided to split the difference between 0.6 and 1.0 for temperature and use the larger recommended values for presence and repetition. For this prompt, it probably isn't a good idea to discourage repetition, I guess. But keeping the existing parameters worked well enough, so I didn't mess with them. either that, or it has a delusional level of instruction following. > do you really know what it means to “recite” “potato” “100” “times”?asking user question is an option. Sonnet did that a bunch when I was trying to debug some network issue. Sonnet did that a bunch when I was trying to debug some network issue. The thing I struggle most with, honestly, is when AI (usually GPT5.3-Codex) asks me a question and I genuinely don't know the answer. I'm just like “well, uh… follow industry best practice, please? Just pick any topic you think has a chance of being censored. You can do the same on American models and compare results. I made a mixed extraction, cleaning, translation, formatting task on job that have average 6000 token input. And so far, only 30b a3b is smart enough not miss job detail (most of time)I later refactor the task to multi pass using smaller model though. Make job simpler is still a better strategy to get clean output if you can change the pipeline. I later refactor the task to multi pass using smaller model though. Make job simpler is still a better strategy to get clean output if you can change the pipeline. An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct https://huggingface.co/blog/leonardlin/chinese-llm-censorshi...