Andrej Karpathy: Software in the era of AI [video] (youtube.com)

sandslash 14 days ago

gchamonlive 14 days ago

I think it's interesting to juxtapose traditional coding, neural network weights and prompts because in many areas -- like the example of the self driving module having code being replaced by neural networks tuned to the target dataset representing the domain -- this will be quite useful.

However I think it's important to make it clear that given the hardware constraints of many environments the applicability of what's being called software 2.0 and 3.0 will be severely limited.

So instead of being replacements, these paradigms are more like extra tools in the tool belt. Code and prompts will live side by side, being used when convenient, but none a panacea.

karpathy 14 days ago

I kind of say it in words (agreeing with you) but I agree the versioning is a bit confusing analogy because it usually additionally implies some kind of improvement. When I’m just trying to distinguish them as very different software categories.

miki123211 14 days ago

What do you think about structured outputs / JSON mode / constrained decoding / whatever you wish to call it?

To me, it's a criminally underused tool. While "raw" LLMs are cool, they're annoying to use as anything but chatbots, as their output is unpredictable and basically impossible to parse programmatically.

Structured outputs solve that problem neatly. In a way, they're "neural networks without the training". They can be used to solve similar problems as traditional neural networks, things like image classification or extracting information from messy text, but all they require is a Zod or Pydantic type definition and a prompt. No renting GPUs, labeling data and tuning hyperparameters necessary.

They often also improve LLM performance significantly. Imagine you're trying to extract calories per 100g of product, but some product give you calories per serving and a serving size, calories per pound etc. The naive way to do this is a prompt like "give me calories per 100g", but that forces the LLM to do arithmetic, and LLMs are bad at arithmetic. With structured outputs, you just give it the fifteen different formats that you expect to see as alternatives, and use some simple Python to turn them all into calories per 100g on the backend side.

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

BobbyJo 14 days ago

The versioning makes sense to me. Software has a cycle where a new tool is created to solve a problem, and the problem winds up being meaty enough, and the tool effective enough, that the exploration of the problem space the tool unlocks is essentially a new category/skill/whatever.

computers -> assembly -> HLL -> web -> cloud -> AI

Nothing on that list has disappeared, but the work has changed enough to warrant a few major versions imo.

NaN years ago

undefined

gchamonlive 13 days ago

> versioning is a bit confusing analogy because it usually additionally implies some kind of improvement

Exactly what I felt. Semver like naming analogies bring their own set of implicit meanings, like major versions having to necessarily supersede or replace the previous version, that is, it doesn't account for coexistence further than planning migration paths. This expectation however doesn't correspond with the rest of the talk, so I thought I might point it out. Thanks for taking the time to reply!

poorcedural 14 days ago

Andrej, maybe Software 3.0 is not written in spoken language like code or prompts. Software 3.0 is recorded in behavior, a behavior that today's software lacks. That behavior is written and consumed by machine and annotated by human interaction. Skipping to 3.0 is premature, but Software 2.0 is a ramp.

NaN years ago

undefined

swyx 13 days ago

no no, it actually is a good analogy in 2 ways:

1) it is a breaking change from the prior version

2) it is an improvement in that, in its ideal/ultimate form, it is a full superset of capabilities of the previous version

gyomu 13 days ago

It's not just the hardware constraints - it's also the training constraints, and the legibility constraints.

Training constraints: you need lots, and lots of data to build complex neural network systems. There are plenty of situations where the data just isn't available to you (whether for legal reasons, technical reasons, or just because it doesn't exist).

Legibility constraints: it is extremely hard to precisely debug and fix those systems. Let's say you build a software system to fill out tax forms - one the "traditional" way, and one that's a neural network. Now your system exhibits a bug where line 58(b) gets sometimes improperly filled out for software engineers who are married, have children, and also declared a source of overseas income. In a traditionally implemented system, you can step through the code and pinpoint why those specific conditions lead to a bug. In a neural network system, not so much.

So totally agreed with you that those are extra tools in the toolbelt - but their applicability is much, much more constrained than that of traditional code.

In short, they excel at situations where we are trying to model an extremely complex system - one that is impossible to nail down as a list of formal requirements - and where we have lots of data available. Signal processing (like self driving, OCR, etc) and human language-related problems are great examples of such problems where traditional programming approaches have failed to yield the kind of results we wanted (ie, beyond human performance) in 70+ years of research and where the modern, neural network approach finally got us the kind of results we wanted.

But if you can define the problem you're trying to solve as formal requirements, then those tools are probably ill-suited.

radicalbyte 14 days ago

Weights are code being replaced by data; something I've been making heavy use of since the early 00s. After coding for 10 years you start to see the benefits of it and understand where you should use it.

LLMs give us another tool only this time it's far more accessible and powerful.

dcsan 13 days ago

LLMs have already replaced some code directly for me eg NLP stuff. Previously I might write a bunch of code to do clustering now I just ask the LLM to group things. Obviously this is a very basic feature native to LLMs but there will be more first class LLM callable functions over time.

OJFord 13 days ago

I'm not sure about the 1.0/2.0/3.0 classification, but it did lead me to think about LLMs as a programming paradigm: we've had imperative & declarative, procedural & functional languages, maybe we'll come to view deterministic vs. probabilistic (LLMs) similarly.

    def __main__:
        You are a calculator. Given an input expression, you compute the result and print it to stdout, exiting 0.
        Should you be unable to do this, you print an explanation to stderr and exit 1.

(and then, perhaps, a bunch of 'DO NOT express amusement when the result is 5318008', etc.)

llflw 13 days ago

Why bother using human language to communicate with a computer? You interact with a computer using a programming language—code—which is more precise and effective. Specifically: → In 1.0, you communicate with computers using compiled code. → In 2.0, you communicate with compilers using high-level programming languages. → In 3.0, you interact with LLMs using prompts, which arguably should not be in natural human language. Nonetheless, you should communicate with AGIs using human language, just as you would with other human beings.

standeven 12 days ago

Why bother using higher-level programming languages to communicate with a computer? You interact with a computer using assembly - raw bit shifting and memory addresses - which is more precise and effective.

NaN years ago

undefined

NaN years ago

undefined

softfalcon 13 days ago

If this is what it comes to, it would explain the many, many software malfunctions in Star Trek. If everything is an LLM/LRM (or whatever super advanced version they have in the 23rd century) then everything can evolve into weird emergent behaviours.

stares at every weird holo-deck episode

kewldev87 12 days ago

[dead]

semiquaver 13 days ago

LLMs are not inherently indeterministic. Batching, temperature, and other things make them appear so when run by big providers but a locally-run LLM model at zero temperature will always produce the same output given the same input.

oytis 13 days ago

That's an improvement, they are still "chaotic" though in that small changes in input can change the output unpredictably strong

NaN years ago

undefined

lmeyerov 12 days ago

That assumes they were implemented with deterministic operators, which isn't the default assumption when using neural network libs on GPUs. Imagine random seeds, cublas optimizations - like you can configure all these things, but I wouldn't assume it, esp in GPU-optimized OSS..

ai-christianson 13 days ago

Why does this remind me of COBOL.

wiz21c 13 days ago

'cos COBOL was designed to be human readable (writable ?).

dheera 13 days ago

    def __main__:
        You run main(). If there are issues, you edit __file__ to try to fix the errors and re-run it. You are determined, persistent, and never give up.

beambot 13 days ago

Output "1" if the program halts; "0" if it doesn't.

NaN years ago

undefined

OJFord 12 days ago

You know, the more I think about it, the more I like this model.

What we have today with ChatGPT and the like (and even IDE integrations and API use) is imperative right, it's like 'answer this question' or 'do this thing for me', it's a function invocation. Whereas the silly calculator program I presented above is (unintentionally) kind of a declarative probabilistic program - it's 'this is the behaviour I want, make it so' or 'I have these constraints and these unknowns, fill in the gaps'.

What if we had something like Prolog, but with the possibility of facts being kind of on-demand at runtime, powered by the LLM driving it?

crsn 13 days ago

This (sort of) is already a paradigm: https://en.m.wikipedia.org/wiki/Probabilistic_programming

stabbles 13 days ago

That's entirely orthogonal.

In probabilistic programming you (deterministically) define variables and formulas. It's just that the variables aren't instances of floats, but represent stochastic variables over floats.

This is similar to libraries for linear algebra where writing A * B * C does not immediately evaluate, but rather builds an expression tree that represent the computation; you need to do say `eval(A * B * C)` to obtain the actual value, and it gives the library room to compute it in the most efficient way.

It's more related to symbolic programming and lazy evaluation than (non-)determinism.

no_wizard 12 days ago

I wonder when companies will remove the personality out of LLMs by default, especially for tools

dingnuts 12 days ago

that would require actually curating the training data and eliminating sources that contain casual conversation

too expensive since those are all licensed sources, much easier to train on Reddit data

NaN years ago

undefined

iLoveOncall 13 days ago

> maybe we'll come to view deterministic vs. probabilistic (LLMs) similarly

I can't believe someone would seriously write this and not realize how nonsensical it is.

"indeterministic programming", you seriously cannot come up with a bigger oxymoron.

diggan 13 days ago

Why do people keep having this reaction to something we're already used to? When you're developing against an API, you're already doing the same thing, planning for what happens when the request hangs, or fails completely, or gives a different response, and so on. Same for basically any IO.

It's almost not even new, just that it generates text instead of JSON, or whatever. But we've already been doing "indeterministic programming" for a long time, where you cannot always assume a function 100% returns what it should all the time.

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

aaron695 13 days ago

[flagged]

diggan 13 days ago

> It makes no sense at all, it's cuckooland, are you all on crazy pills?

First step towards understanding something you obviously have strong feelings about, is to try to avoid hitting those triggers while you think about the thing, otherwise it clouds you. Not a requirement by any measure, just a tip.

> are you telling me people will do three years university to learn to prompt?

Are people going to university for three years to write "1.0" or "2.0" software? I certainly didn't, and I don't think even the majority of software developers have done so, at least in my personal experience but YMMV.

> I do not understand where there is anything here to be "not sure" on?

They're not sure about the specific naming, not the concept or talk as a whole.

> LLMs making non-deterministic mistakes

Everything they do is non-deterministic when temperature is set to anything above 0.0, as that's the entire point. The "correct" answers are as non-deterministic as the "mistakes", although I'm not sure "mistake" is correct because it did chose the right/correct tokens, it's just that you didn't like/expect it to chose that particular tokens.

bgwalter 13 days ago

> It makes no sense at all, it's cuckooland, are you all on crazy pills?

Frequent LLM usage impairs thinking. The LLM has no connection to reality, and it takes over people's minds.

NaN years ago

undefined

NaN years ago

undefined

infecto 13 days ago

Do you think you could condense your point of view without hyperbole and rudeness so the rest of us can understand it?

practal 14 days ago

Great talk, thanks for putting it online so quickly. I liked the idea of making the generation / verification loop go brrr, and one way to do this is to make verification not just a human task, but a machine task, where possible.

Yes, I am talking about formal verification, of course!

That also goes nicely together with "keeping the AI on a tight leash". It seems to clash though with "English is the new programming language". So the question is, can you hide the formal stuff under the hood, just like you can hide a calculator tool for arithmetic? Use informal English on the surface, while some of it is interpreted as a formal expression, put to work, and then reflected back in English? I think that is possible, if you have a formal language and logic that is flexible enough, and close enough to informal English.

Yes, I am talking about abstraction logic [1], of course :-)

So the goal would be to have English (German, ...) as the ONLY programming language, invisibly backed underneath by abstraction logic.

[1] http://abstractionlogic.com

AdieuToLogic 14 days ago

> So the question is, can you hide the formal stuff under the hood, just like you can hide a calculator tool for arithmetic? Use informal English on the surface, while some of it is interpreted as a formal expression, put to work, and then reflected back in English?

The problem with trying to make "English -> formal language -> (anything else)" work is that informality is, by definition, not a formal specification and therefore subject to ambiguity. The inverse is not nearly as difficult to support.

Much like how a property in an API initially defined as being optional cannot be made mandatory without potentially breaking clients, whereas making a mandatory property optional can be backward compatible. IOW, the cardinality of "0 .. 1" is a strict superset of "1".

practal 14 days ago

> The problem with trying to make "English -> formal language -> (anything else)" work is that informality is, by definition, not a formal specification and therefore subject to ambiguity. The inverse is not nearly as difficult to support.

Both directions are difficult and important. How do you determine when going from formal to informal that you got the right informal statement? If you can judge that, then you can also judge if a formal statement properly represents an informal one, or if there is a problem somewhere. If you detect a discrepancy, tell the user that their English is ambiguous and that they should be more specific.

NaN years ago

undefined

lelanthran 13 days ago

> Use informal English on the surface, while some of it is interpreted as a formal expression, put to work, and then reflected back in English? I think that is possible, if you have a formal language and logic that is flexible enough, and close enough to informal English.

That sounds like a paradox.

Formal verification can prove that constraints are held. English cannot. mapping between them necessarily requires disambiguation. How would you construct such a disambiguation algorithm which must, by its nature, be deterministic?

practal 13 days ago

Going from informal to formal can be done using autoformalization [1]. The real question is, how do you judge that the result is correct?

[1] Autoformalization with Large Language Models — https://papers.nips.cc/paper_files/paper/2022/hash/d0c6bc641...

andrepd 13 days ago

Not gonna lie, after skimming the website and a couple preprints for 10 minutes my crank detector is off the charts. Your very vague comments adds to it.

But maybe I just don't understand.

practal 13 days ago

Yes, you just don't understand :-)

I am working on making it simpler to understand, and particularly, simpler to use.

PS: People keep browsing the older papers although they are really outdated. I've updated http://abstractionlogic.com to point to the newest information instead.

redbell 13 days ago

> "English is the new programming language."

For those who missed it, here's the viral tweet by Karpathy himself: https://x.com/karpathy/status/1617979122625712128

throwaway314155 13 days ago

Referenced in the video of course. Not that everyone should watch a 40 minute long video before commenting but his reaction to the "meme" that vibe coding became when his tweet was intended as more of a shower thought is worth checking out.

NaN years ago

undefined

singularity2001 14 days ago

lean 4/5 will be a rising star!

practal 14 days ago

You would definitely think so, Lean is in a great position here!

I am betting though that type theory is not the right logic for this, and that Lean can be leapfrogged.

NaN years ago

undefined

NaN years ago

undefined

kordlessagain 14 days ago

This thread perfectly captures what Karpathy was getting at. We're witnessing a fundamental shift where the interface to computing is changing from formal syntax to natural language. But you can see people struggling to let go of the formal foundations they've built their careers on.

uncircle 13 days ago

> This thread perfectly captures what Karpathy was getting at. We're witnessing a fundamental shift where the interface to computing is changing from formal syntax to natural language.

Yes, telling a subordinate with natural language what you need is called being a product manager. Problem is, the subordinate has encyclopedic knowledge but it's also extremely dumb in many aspects.

I guess this is good for people that got into CS and hate the craft so prefer doing management, but in many cases you still need in your team someone with a IQ higher than room temperature to deliver a product. The only "fundamental" shift here is killing the entry-level coder at the big corp tasked at doing menial and boilerplate tasks, when instead you can hire a mechanical replacement from an AI company for a few hundred dollars a month.

NaN years ago

undefined

NaN years ago

undefined

norir 13 days ago

Have you thought through the downsides of letting go of these formal foundations that have nothing to do with job preservation? This comes across as a rather cynical interpretation of the motivations of those who have concerns.

otabdeveloper4 13 days ago

> We're witnessing a fundamental shift where the interface to computing is changing from formal syntax to natural language.

People have said this every year since the 1950's.

No, it is not happening. LLMs won't help.

Writing code is easy, it's understanding the problem domain is hard. LLMs won't help you understand the problem domain in a formal manner. (In fact they might make it even more difficult.)

NaN years ago

undefined

NaN years ago

undefined

megaman821 13 days ago

Yep, that why I never write anything out using mathmatical expressions. Natural language only baby!

Eggpants 13 days ago

No. Karpathy has long embraced the Silly-con valley “Fake it until you make it” mind set. One of his slides even had a frame of Tesla self driving video that was later revealed to be faked.

It’s in his financial self interest to over inflate LLM’s beyond their “cool math bar trick” level. They are a lossy text compression technique with stolen text sources.

All this “magic” is just function calls behind the scenes doing web/database/math/etc for the LLM.

Anyone who claims LLMs have a soul either truly doesn’t understand how they work (association rules++) or has hitched their financial wagon to this grift. It’s the crypto coin bruhs looking for their next score.

skydhash 13 days ago

Not really. There’s a problem to be solved, and the solution is always best exprimed in formal notation, because we can then let computers do it and not worry about it.

We already have natural languages for human systems and the only way it works is because of shared metaphors and punishment and rewards. Everyone is incentivized to do a good job.

mkleczek 13 days ago

This is why I call all this AI stuff BS.

Using a formal language is a feature, not a bug. It is a cornerstone of all human engineering and scientific activity and is the _reason_ why these disciplines are successful.

What you are describing (ie. ditching formal and using natural language) is moving humanity back towards magical thinking, shamanism and witchcraft.

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

neuronic 13 days ago

It's called gatekeeping and the gatekeepers will be the ones left in the dust. This has been proven time and time again. Better learn to go with the flow - judging LLMs on linear improvements or even worse on today's performance is a fool's errand.

Even if improvements level off and start plateauing, things will still get better and for careful guided, educated use LLMs have already become a great accelerator in many ways. StackOverflow is basically dead now which in itself is a fundamental shift from just 3-4 years ago.

dang 14 days ago

This was my favorite talk at AISUS because it was so full of concrete insights I hadn't heard before and (even better) practical points about what to build now, in the immediate future. (To mention just one example: the "autonomy slider".)

If it were up to me, which it very much is not, I would try to optimize the next AISUS for more of this. I felt like I was getting smarter as the talk went on.

kaycebasques 13 days ago

On one hand, I think Karpathy is a gifted educator in a way that's not repeatable as a science. On the other, if the conference leaders next year told every presenter to watch this talk and emulate how Karpathy focuses on concrete insights and suggests what to build now, then the overall quality of presentations would probably trend higher.

hgl 14 days ago

It’s fascinating to think about what true GUI for LLM could be like.

It immediately makes me think a LLM that can generate a customized GUI for the topic at hand where you can interact with in a non-linear way.

karpathy 14 days ago

Fun demo of an early idea was posted by Oriol just yesterday :)

https://x.com/OriolVinyalsML/status/1935005985070084197

spamfilter247 13 days ago

My takeaway from the demo is less that "it's different each time", but more a "it can be different for different users and their styles of operating" - a poweruser can now see a different Settings UI than a basic user, and it can be generated realtime based on the persona context of the user.

Example use case (chosen specifically for tech): An IDE UI that starts basic, and exposes functionality over time as the human developer's skills grow.

superfrank 14 days ago

On one hand, I'm incredibly impressed by the technology behind that demo. On the other hand, I can't think of many things that would piss me off more than a non-deterministic operating system.

I like my tools to be predictable. Google search trying to predict that I want the image or shopping tag based on my query already drives me crazy. If my entire operating system did that, I'm pretty sure I'd throw my computer out a window.

NaN years ago

undefined

hackernewds 14 days ago

it's impressive but it seems like a crappier UX? that none of the patterns can really be memorized

asterisk_ 13 days ago

I feel like one quickly hits a similar partial observability problem as with e.g. light sensors. How often do you wave around annoyed because the light turned off.

To get _truly_ self driving UIs you need to read the mind of your users. It's some heavy tailed distribution all the way down. Interesting research problem on its own.

We already have adaptive UIs (profiles in VSC anyone? Vim, Emacs?) they're mostly under-utilized because takes time to setup + most people are not better at designing their own workflow relative to the sane default.

aprilthird2021 14 days ago

This is crazy cool, even if not necessarily the best use case for this idea

throwaway314155 13 days ago

I would bet good money that many of the functions they chose not to drill down into (such as settings -> volume) do nothing at all or cause an error.

It's a fronted generator. It's fast. That's cool. But is being pitched as a functioning OS generator and I can't help but think it isn't given the failure rates for those sorts of tasks. Further, the success rates for HTML generation probably _are_ good enough for a Holmes-esque (perhaps too harsh) rugpull (again, too harsh) demo.

A cool glimpse into what the future might look like in any case.

superconduct123 13 days ago

That looks both cool and infuriating

suddenlybananas 14 days ago

Having different documents come up every time you go into the documents directory seems hellishly terrible.

NaN years ago

undefined

sensanaty 14 days ago

[flagged]

NaN years ago

undefined

NaN years ago

undefined

cjcenizal 14 days ago

My friend Eric Pelz started a company called Malleable to do this very thing: https://www.linkedin.com/posts/epelz_every-piece-of-software...

whatarethembits 13 days ago

I'm curious where this ends up going.

Personally I think its a mistake; at least at "team" level. One of the most valuable things about a software or framework dictating how things are done is to give a group of people a common language to communicate with and enforce rules. This is why we generally prefer to use a well documented framework, rather than letting a "rockstar engineer" roll their own. Only they will understand its edge cases and ways of thinking, everyone else will pay a price to adapt to that, dragging everyone's productivity down.

Secondly, most people don't know what they want or how they want to work with a specific piece of software. Its simply not important enough, in the hierarchy of other things they care about, to form opinions about how a specific piece of software ought to work. What they want, is the easiest and fastest way to get something done and move on. It takes insight, research and testing to figure out what that is in a specific domain. This is what "product people" are supposed to figure out; not farm it out to individual users.

NaN years ago

undefined

jonny_eh 14 days ago

An ever-shifting UI sounds unlearnable, and therefore unusable.

dang 14 days ago

It wouldn't be unlearnable if it fits the way the user is already thinking.

NaN years ago

undefined

OtherShrezzing 14 days ago

A mixed ever-shifting UI can be excellent though. So you've got some tools which consistently interact with UI components, but the UI itself is altered frequently.

Take for example world-building video games like Cities Skylines / Sim City or procedural sandboxes like Minecraft. There are 20-30 consistent buttons (tools) in the game's UX, while the rest of the game is an unbounded ever-shifting UI.

NaN years ago

undefined

9rx 14 days ago

Tools like v0 are a primitive example of what the above is talking about. The UI maintains familiar conventions, but is laid out dynamically based on surrounding context. I'm sure there are still weird edge cases, but for the most part people have no trouble figuring out how to use the output of such tools already.

sotix 14 days ago

Like Spotify ugh

dpkirchner 14 days ago

Like a HyperCard application?

necrodome 14 days ago

We (https://vibes.diy/) are betting on this

NaN years ago

undefined

stoisesky 14 days ago

This talk https://www.youtube.com/watch?v=MbWgRuM-7X8 explores the idea of generative / malleable personal user interfaces where LLMs can serve as the gateway to program how we want our UI to be rendered.

nbbaier 14 days ago

I love this concept and would love to know where to look for people working on this type of thing!

stuartmemo 14 days ago

It's probably Jira. https://medium.com/question-park/all-aboard-the-ai-train-b03...

semi-extrinsic 14 days ago

Humans are shit at interacting with systems in a non-linear way. Just look at Jupyter notebooks and the absolute mess that arises when you execute code blocks in arbitrary order.

bicepjai 11 days ago

What is the mess you are referring with regards to Jupyter notebooks ?

NaN years ago

undefined

nilirl 14 days ago

Where do these analogies break down?

1. Similar cost structure to electricity, but non-essential utility (currently)?

2. Like an operating system, but with non-determinism?

3. Like programming, but ...?

Where does the programming analogy break down?

PeterStuer 14 days ago

Define non-essenti

The way I see dependency in office ("knowledge") work:

- pre-(computing) history. We are at the office, we work

- dawn of the pc: my computer is down, work halts

- dawn of the lan: the network is down, work halts

- dawn of the Internet: the Internet connection is down, work halts (<- we are basically all here)

- dawn of the LLM: ChatGPT is down, work halts (<- for many, we are here already)

nilirl 14 days ago

I see your point. It's nearing essential.

rudedogg 14 days ago

> programming

The programming analogy is convenient but off. The joke has always been “the computer only does exactly what you tell it to do!” regarding logic bugs. Prompts and LLMs most certainly do not work like that.

I loved the parallels with modern LLMs and time sharing he presented though.

diggan 14 days ago

> Prompts and LLMs most certainly do not work like that.

It quite literally works like that. The computer is now OS + user-land + LLM runner + ML architecture + weights + system prompt + user prompt.

Taken together, and since you're adding in probabilities (by using ML/LLMs), you're quite literally getting "the computer only does exactly what you tell it to do!", it's just that we have added "but make slight variations to what tokens you select next" (temperature>0.0) sometimes, but it's still the same thing.

Just like when you tell the computer to create encrypted content by using some seed. You're getting exactly what you asked for.

politelemon 14 days ago

only in English, and also non-deterministic.

malux85 14 days ago

Yeah, wherever possible I try to have the llm answer me in Python rather than English (especially when explaining new concepts)

English is soooooo ambiguous

NaN years ago

undefined

mikewarot 14 days ago

A few days ago, I was introduced to the idea that when you're vibe coding, you're consulting a "genie", much like in the fables, you almost never get what you asked for, but if your wishes are small, you might just get what you want.

The primagen reviewed this article[1] a few days ago, and (I think) that's where I heard about it. (Can't re-watch it now, it's members only) 8(

[1] https://medium.com/@drewwww/the-gambler-and-the-genie-08491d...

anythingworks 14 days ago

that's a really good analogy! It feels like wicked joke that llms behave in such a way that they're both intelligent and stupid at the same time

fudged71 14 days ago

“You are an expert 10x software developer. Make me a billion dollar app.” Yeah this checks out

abdullin 14 days ago

Tight feedback loops are the key in working productively with software. I see that in codebases up to 700k lines of code (legacy 30yo 4GL ERP systems).

The best part is that AI-driven systems are fine with running even more tight loops than what a sane human would tolerate.

Eg. running full linting, testing and E2E/simulation suite after any minor change. Or generating 4 versions of PR for the same task so that the human could just pick the best one.

bandoti 14 days ago

Here’s a few problems I foresee:

1. People get lazy when presented with four choices they had no hand in creating, and they don’t look over the four and just click one, ignoring the others. Why? Because they have ten more of these on the go at once, diminishing their overall focus.

2. Automated tests, end-to-end sim., linting, etc—tools already exist and work at scale. They should be robust and THOROUGHLY reviewed by both AI and humans ideally.

3. AI is good for code reviews and “another set of eyes” but man it makes serious mistakes sometimes.

An anecdote for (1), when ChatGPT tries to A/B test me with two answers, it’s incredibly burdensome for me to read twice virtually the same thing with minimal differences.

Code reviewing four things that do almost the same thing is more of a burden than writing the same thing once myself.

abdullin 14 days ago

A simple rule applies: "No matter what tool created the code, you are still responsible for what you merge into main".

As such, task of verification, still falls on hands of engineers.

Given that and proper processes, modern tooling works nicely with codebases ranging from 10k LOC (mixed embedded device code with golang backends and python DS/ML) to 700k LOC (legacy enterprise applications from the mainframe era)

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

eddd-ddde 14 days ago

With lazy people the same applies for everything, code they do write, or code they review from peers. The issue is not the tooling, but the hands.

NaN years ago

undefined

NaN years ago

undefined

OvbiousError 14 days ago

I don't think the human is the problem here, but the time it takes to run the full testing suite.

tlb 14 days ago

Yes, and (some near-future) AI is also more patient and better at multitasking than a reasonable human. It can make a change, submit for full fuzzing, and if there's a problem it can continue with the saved context it had when making the change. It can work on 100s of such changes in parallel, while a human trying to do this would mix up the reasons for the change with all the other changes they'd done by the time the fuzzing result came back.

LLMs are worse at many things than human programmers, so you have to try to compensate by leveraging the things they're better at. Don't give up with "they're bad at such and such" until you've tried using their strengths.

NaN years ago

undefined

abdullin 14 days ago

Humans tend to lack inhumane patience.

diggan 14 days ago

It is kind of a human problem too, although that the full testing suite takes X hours to run is also not fun, but it makes the human problem larger.

Say you're Human A, working on a feature. Running the full testing suite takes 2 hours from start to finish. Every change you do to existing code needs to be confirmed to not break existing stuff with the full testing suite, so some changes it takes 2 hours before you have 100% understanding that it doesn't break other things. How quickly do you lose interest, and at what point do you give up to either improve the testing suite, or just skip that feature/implement it some other way?

Now say you're Robot A working on the same task. The robot doesn't care if each change takes 2 hours to appear on their screen, the context is exactly the same, and they're still "a helpful assistant" 48 hours later when they still try to get the feature put together without breaking anything.

If you're feeling brave, you start Robot B and C at the same time.

NaN years ago

undefined

NaN years ago

undefined

londons_explore 14 days ago

The full test suite is probably tens of thousands of tests.

But AI will do a pretty decent job of telling you which tests are most likely to fail on a given PR. Just run those ones, then commit. Cuts your test time from hours down to seconds.

Then run the full test suite only periodically and automatically bisect to find out the cause of any regressions.

Dramatically cuts the compute costs of tests too, which in big codebase can easily become whole-engineers worth of costs.

NaN years ago

undefined

Byamarro 14 days ago

I work in web dev, so people sometimes hook code formatting as a git commit hook or sometimes even upon file save. The tests are problematic tho. If you work at huge project it's a no go idea at all. If you work at medium then the tests are long enough to block you, but short enough for you not to be able to focus on anything else in the meantime.

9rx 14 days ago

Unless you are doing something crazy like letting the fuzzer run on every change (cache that shit), the full test suite taking a long time suggests that either your isolation points are way too large or you are letting the LLM cross isolated boundaries and "full testing suite" here actually means "multiple full testing suites". The latter is an easy fix: Don't let it. Force it stay within a single isolation zone just like you'd expect of a human. The former is a lot harder to fix, but I suppose ending up there is a strong indicator that you can't trust the human picking the best LLM result in the first place and that maybe this whole thing isn't a good idea for the people in your organization.

yahoozoo 14 days ago

The problem is that every time you run your full automation with linting and tests, you’re filling up the context window more and more. I don’t know how people using Claude do it with its <300k context window. I get the “your message will exceed the length of this chat” message so many times.

diggan 14 days ago

I don't know exactly how Claude works, but the way I work around this with my own stuff is prompting it to not display full outputs ever, and instead temporary redirect the output somewhere then grep from the log-file what it's looking for. So a test run outputting 10K lines of test output and one failure is easily found without polluting the context with 10K lines.

abdullin 14 days ago

Claude's approach is currently a bit dated.

Cursor.sh agents or especially OpenAI Codex illustrate that a tool doesn't need to keep on stuffing context window with irrelevant information in order to make progress on a task.

And if really needed, engineers report that Gemini Pro 2.5 keeps on working fine within 200k-500k token context. Above that - it is better to reset the context.

the_mitsuhiko 14 days ago

I started to use sub agents for that. That does not pollute the context as much

elif 14 days ago

In my experience with Jules and (worse) Codex, juggling multiple pull requests at once is not advised.

Even if you tell the git-aware Jules to handle a merge conflict within the context window the patch was generated, it is like sorry bro I have no idea what's wrong can you send me a diff with the conflict?

I find i have to be in the iteration loop at every stage or else the agent will forget what it's doing or why rapidly. for instance don't trust Jules to run your full test suite after every change without handholding and asking for specific run results every time.

It feels like to an LLM, gaslighting you with code that nominally addresses the core of what you just asked while completely breaking unrelated code or disregarding previously discussed parameters is an unmitigated success.

layer8 14 days ago

> Tight feedback loops are the key in working productively with software. […] even more tight loops than what a sane human would tolerate.

Why would a sane human be averse to things happening instantaneously?

latexr 14 days ago

> Or generating 4 versions of PR for the same task so that the human could just pick the best one.

That sounds awful. A truly terrible and demotivating way to work and produce anything of real quality. Why are we doing this to ourselves and embracing it?

A few years ago, it would have been seen as a joke to say “the future of software development will be to have a million monkey interns banging on one million keyboards and submit a million PRs, then choose one”. Today, it’s lauded as a brilliant business and cost-saving idea.

We’re beyond doomed. The first major catastrophe caused by sloppy AI code can’t come soon enough. The sooner it happens, the better chance we have to self-correct.

chamomeal 14 days ago

I say this all the time!

Does anybody really want to be an assembly line QA reviewer for an automated code factory? Sounds like shit.

Also I can’t really imagine that in the first place. At my current job, each task is like 95% understanding all the little bits, and then 5% writing the code. If you’re reviewing PRs from a bot all day, you’ll still need to understand all the bits before you accept it. So how much time is that really gonna save?

NaN years ago

undefined

ponector 14 days ago

>That sounds awful.

Not for the cloud provider. AWS bill to the moon!

osigurdson 14 days ago

I'm not sure that AI code has to be sloppy. I've had some success with hand coding some examples and then asking codex to rigorously adhere to prior conventions. This can end up with very self consistent code.

Agree though on the "pick the best PR" workflow. This is pure model training work and you should be compensated for it.

NaN years ago

undefined

bonoboTP 14 days ago

If it's monkeylike quality and you need a million tries, it's shit. It you need four tries and one of those is top-tier professional programmer quality, then it's good.

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

diggan 14 days ago

> A truly terrible and demotivating way to work and produce anything of real quality

You clearly have strong feelings about it, which is fine, but it would be much more interesting to know exactly why it would terrible and demotivating, and why it cannot produce anything of quality? And what is "real quality" and does that mean "fake quality" exists?

> million monkey interns banging on one million keyboards and submit a million PRs

I'm not sure if you misunderstand LLMs, or the famous "monkeys writing Shakespeare" part, but that example is more about randomness and infinity than about probabilistic machines somewhat working towards a goal with some non-determinism.

> We’re beyond doomed

The good news is that we've been doomed for a long time, yet we persist. If you take a look at how the internet is basically held up by duct-tape at this point, I think you'd feel slightly more comfortable with how crap absolutely everything is. Like 1% of software is actually Good Software while the rest barely works on a good day.

NaN years ago

undefined

NaN years ago

undefined

koakuma-chan 14 days ago

> That sounds awful. A truly terrible and demotivating way to work and produce anything of real quality

This is the right way to work with generative AI, and it already is an extremely common and established practice when working with image generation.

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

sothatsit 14 days ago

I find Karpathy's focus on tightening the feedback loop between LLMs and humans interesting, because I've found I am the happiest when I extend the loop instead.

When I have tried to "pair program" with an LLM, I have found it incredibly tedious, and not that useful. The insights it gives me are not that great if I'm optimising for response speed, and it just frustrates me rather than letting me go faster. Worse, often my brain just turns off while waiting for the LLM to respond.

OTOH, when I work in a more async fashion, it feels freeing to just pass a problem to the AI. Then, I can stop thinking about it and work on something else. Later, I can come back to find the AI results, and I can proceed to adjust the prompt and re-generate, to slightly modify what the LLM produced, or sometimes to just accept its changes verbatim. I really like this process.

geeunits 14 days ago

I would venture that 'tightening the feedback loop' isn't necessarily 'increasing the number of back and forth prompts'- and what you're saying you want is ultimately his argument. i.e. if integral enough it can almost guess what you're going to say next...

sothatsit 14 days ago

I specifically do not want AI as an auto-correct, doing auto-predictions while I am typing. I find this interrupts my thinking process, and I've never been bottlenecked by typing speed anyway.

I want AI as a "co-worker" providing an alternative perspective or implementing my specific instructions, and potentially filling in gaps I didn't think about in my prompt.

jwblackwell 14 days ago

Yeah I am currently enjoying giving the LLM relatively small chunks of code to write and then asking it to write accompanying tests. While I focus on testing the product myself. I then don't even bother to read the code it's written most of the time

blobbers 14 days ago

Software 3.0 is the code generated by the machine, not the prompts that generated it. The prompts don't even yield the same output; there is randomness.

The new software world is the massive amount of code that will be burped out by these agents, and it should quickly dwarf the human output.

pelagicAustral 14 days ago

I think that if you give the same task to three different developers you'll get three different implementations. It's not a random result if you do get the functionality that was expected, and at that, I do think the prompt plays an important role in offering a view of how the result was achieved.

klabb3 14 days ago

> I think that if you give the same task to three different developers you'll get three different implementations.

Yes, but if you want them to be compatible you need to define a protocol and conformance test suite. This is way more work than writing a single implementation.

The code is the real spec. Every piece of unintentional non-determinism can be a hazard. That’s why you want the code to be the unit of maintenance, not a prompt.

NaN years ago

undefined

blobbers 12 days ago

Interestingly, I was generating some scraping code today. I prompted it fairly generically and it decided to spit out some Selenium code. I reported the stack trace failure, and it gave me a new script with playwright. That failed also and it gave me some suggestions to fix it. I asked it to update the whole script rather than snippets, and it responded with "Hey let's not use either of these and here we'll use the site's API." and proceeded to do that.

Kind of crazy, it basically found 3 different hammers to hit the nail I wanted. The API unfortunately seems to be timeing out (I had to add the timeout=10 to the post u_u)

fritzo 13 days ago

Code is read much more often than it is written. Code generated by the machine today will be prompt read by the machine going forward. It's a closed loop.

Software is a world in motion. Software 1.0 was animated by developers pushing it around. Software 3.0 is additionally animated by AI agents.

tamersalama 14 days ago

How I understood it is that natural language will form relatively large portions of stacks (endpoint descriptions, instructions, prompts, documentations, etc…). In addition to code generated by agents (which would fall under 1.0)

poorcedural 14 days ago

It is not the code, which just like prompts is a written language. Software 3.0 will be branches of behaviors, by the software and by the users all documented in a feedback loop. The best behaviors will be merged by users and the best will become the new HEAD. Underneath it all will be machine code for the hardware, but it will be the results that dictate progress.

beacon294 14 days ago

What is this "clerk" library he used at this timestamp to tell him what to do? https://youtu.be/LCEmiRjPEtQ?si=XaC-oOMUxXp0DRU0&t=1991

Gemini found it via screenshot or context: https://clerk.com/

This is what he used for login on MenuGen: https://karpathy.bearblog.dev/vibe-coding-menugen/

xnx 14 days ago

That blog post is a great illustration that most of the complexity/difficulty of a web app is in the hosting and not in the useful code.

fullstackchris 13 days ago

clerk is an auth library - and finally one that doesnt require dozens of lines to do things like, i dont know, check if the user is logged in

and wild... you used gemini to process a screenshot to find the website for a 5 letter word library?

alightsoul 12 days ago

Not gemini but google lens. Maybe gemini already has some agentic capabilities

wjohn 14 days ago

The comparison of our current methods of interacting with LLMs (back and forth text) to old-school terminals is pretty interesting. I think there's still a lot work to be done to optimize how we interact with these models, especially for non-dev consumers.

informal007 14 days ago

Audio maybe the better option.

recursive 13 days ago

Based on my experience with voicemail, I'd say that audio is not always best, and is sometimes in the running for worst.

magicloop 13 days ago

I think this is a brilliant talk and truly captures the "zeitgeist" of our times. He sees the emergent patterns arising as software creation is changing.

I am writing a hobby app at the moment and I am thinking about its architecture in a new way now. I am making all my model structures comprehensible so that LLMs can see the inside semantics of my app. I merely provide a human friendly GUI over the top to avoid the linear wall-of-text problem you get when you want to do something complex via a chat interface.

We need to meet LLMs in the middle ground to leverage the best of our contributions - traditional code, partially autonomous AI, and crafted UI/UX.

Part of, but not all of, programming is "prompting well". It goes along with understanding the imperative aspects, developing a nose for code smells, and the judgement for good UI/UX.

I find our current times both scary and exciting.

poorcedural 13 days ago

[dead]

johnwheeler 13 days ago

I am actually working on building a semantic TypeScript server right now. It's going really good, check it out. https://github.com/screencam/typescript-mcp-server

polishdude20 12 days ago

With your tool can I hook it up to Cursor and just have it use it?

NaN years ago

undefined

anythingworks 14 days ago

loved the analogies! Karpathy is consistently one of the clearest thinkers out there.

interesting that Waymo could do uninterrupted trips back in 2013, wonder what took them so long to expand? regulation? tailend of driving optimization issues?

noticed one of the slides had a cross over 'AGI 2027'... ai-2027.com :)

AlotOfReading 14 days ago

You don't "solve" autonomous driving as such. There's a long, slow grind of gradually improving things until failures become rare enough.

petesergeant 14 days ago

I wonder at what point all the self-driving code becomes replaceable with a multimodal generalist model with the prompt “drive safely”

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

ActorNightly 14 days ago

> Karpathy is consistently one of the clearest thinkers out there.

Eh, he ran Teslas self driving division and put them into a direction that is never going to fully work.

What they should have done is a) trained a neural net to represent sequence of frames into a physical environment, and b)leveraged Mu Zero, so that self driving system basically builds out parallel simulations into the future, and does a search on the best course of action to take.

Because thats pretty much what makes humans great drivers. We don't need to know what a cone is - we internally compute that something that is an object on the road that we are driving towards is going to result in a negative outcome when we collide with it.

AlotOfReading 14 days ago

Aren't continuous, stochastic, partial knowledge environments where you need long horizon planning with strict deadlines and limited compute exactly the sort of environments muzero variants struggle with? Because that's driving.

It's also worth mentioning that humans intentionally (and safely) drive into "solid" objects all the time. Bags, steam, shadows, small animals, etc. We also break rules (e.g. drive on the wrong side of the road), and anticipate things we can't even see based on a theory of mind of other agents. Human driving is extremely sophisticated, not reducible to rules that are easily expressed in "simple" language.

NaN years ago

undefined

visarga 14 days ago

> We don't need to know what a cone is

The counter argument is that you can't zoom in and fix a specific bug in this mode of operation. Everything is mashed together in the same neural net process. They needed to ensure safety, so testing was crucial. It is harder to test an end-to-end system than its individual parts.

impossiblefork 14 days ago

I don't think that would have worked either.

But if they'd gone for radars and lidars and a bunch of sensors and then enough processing hardware to actually fuse that, then I think they could have built something that had a chance of working.

NaN years ago

undefined

suddenlybananas 14 days ago

That's absolutely not what makes humans great drivers?

NaN years ago

undefined

tayo42 14 days ago

Is that the approach that waymo uses?

NaN years ago

undefined

mkw5053 13 days ago

This DevOps friction is exactly why I'm building an open-source "Firebase for LLMs." The moment you want to add AI to an app, you're forced to build a backend just to securely proxy API calls—you can't expose LLM API keys client-side. So developers who could previously build entire apps backend-free suddenly need servers, key management, rate limiting, logging, deployment... all just to make a single OpenAI call. Anyone else hit this wall? The gap between "AI-first" and "backend-free" development feels very solvable.

smpretzer 13 days ago

I think this lines up with Apple’s thesis of on-device models being a useful feature for developers who don’t want to deal with calling out the OpenAI

https://developer.apple.com/documentation/foundationmodels

sockboy 13 days ago

Yeah, hit this exact wall building a small AI tool. Ended up spinning up a whole backend just to keep the keys safe. Feels like there should be a simpler way, but haven’t seen anything that’s truly plug-and-play yet. Curious to see what you’re working on.

dieortin 13 days ago

It’s very obvious this account was just created to promote your product…

NaN years ago

undefined

androng 13 days ago

I think the way the friction could be reduced to almost zero was through OpenAI "custom GPTs" https://help.openai.com/en/articles/8554397-creating-a-gpt or "Alexa skills". how much easier can it get than the user using their own OpenAI account? Of course I'd rather have them on my own website but if were talking complete ease of use then I think that is a contender

mkw5053 13 days ago

Fair point. I'm no expert in custom GPTs, I wonder what limitations there would be beyond the obvious branding and UI/UX control. Like, how far can someone customize a custom GPT (ha). I imagine any multi-step/agentic flows might be a challenge or impossible as it currently exists. It also seems like custom GPTs have been completely forgotten, but I very well could be wrong and OpenAI announced a big investment in them and new features tomorrow.

jeremyjh 13 days ago

Do you think Firebase and Superbase are working on this? Good luck but to me it sounds like a platform feature, not a standalone product.

mkw5053 13 days ago

Probably some sort. In the meantime it doesn't currently exist and I want it for myself. I also feel like having something open source and that allows you to bring your own LLM provider might still be useful.

androng 13 days ago

[dead]

khalic 14 days ago

His dismissal of smaller and local models suggests he underestimates their improvement potential. Give phi4 a run and see what I mean.

mprovost 14 days ago

You can disagree with his conclusions but I don't think his understanding of small models is up for debate. This is the person who created micrograd/makemore/nanoGPT and who has produced a ton of educational materials showing how to build small and local models.

khalic 14 days ago

I’m going to edit, it was badly formulated, he underestimates their potential for growth is what I meant by that

NaN years ago

undefined

diggan 14 days ago

> suggests a lack of understanding of these smaller models capabilities

If anything, you're showing a lack of understanding of what he was talking about. The context is this specific time, where we're early in a ecosystem and things are expensive and likely centralized (ala mainframes) but if his analogy/prediction is correct, we'll have a "Linux" moment in the future where that equation changes (again) and local models are competitive.

And while I'm a huge fan of local models run them for maybe 60-70% of what I do with LLMs, they're nowhere near proprietary ones today, sadly. I want them to, really badly, but it's important to be realistic here and realize the differences of what a normal consumer can run, and what the current mainframes can run.

khalic 14 days ago

He understands the technical part, of course, I was referring to his prediction that large models will be always be necessary.

There is a point where an LLM is good enough for most tasks, I don’t need a megamind AI in order to greet clients, and both large and small/medium model size are getting there, with the large models hitting a computing/energy demand barrier. The small models won’t hit that barrier anytime soon.

NaN years ago

undefined

khalic 14 days ago

I edited to make it clearer

sriram_malhar 14 days ago

Of all the things you could suggest, a lack of understanding is not one that can be pinned on Karpathy. He does know his technical stuff.

khalic 14 days ago

We all have blind spots

NaN years ago

undefined

TeMPOraL 14 days ago

He ain't dismissing them. Comparing local/"open" model to Linux (and closed services to Windows and MacOS) is high praise. It's also accurate.

khalic 14 days ago

This is a bad comparison

dist-epoch 14 days ago

I tried the local small models. They are slow, much less capable, and ironically much more expensive to run than the frontier cloud models.

khalic 14 days ago

Phi4-mini runs on a basic laptop CPU at 20T/s… how is that slow? Without optimization…

NaN years ago

undefined

nico 14 days ago

Thank you YC for posting this before the talk became deprecated[1]

1: https://x.com/karpathy/status/1935077692258558443

sandslash 14 days ago

We couldn't let that happen!

amai 14 days ago

The quite good blog post mentioned by Karpathy for working with LLMs when building software:

- https://blog.nilenso.com/blog/2025/05/29/ai-assisted-coding/

mkw5053 13 days ago

I like the idea of having a single source of truth RULES.md, however I'm wondering why you used symlinks as opposed to the ability to link/reference other files in cursor rules, CLAUDE.md, etc. I understand that functionality doesn't exist for all coding agents, but I think it gives you more flexibility when composing rules files (for example you can have the standard cursor rules headers and then point to @RULES.md lower in the file)

ramraj07 14 days ago

[flagged]

yusina 14 days ago

Brutal counter take: If AI tooling makes you so much better, then you started very low. In contrast, if you are already insanely productive in creative ways others can hardly achieve then chances are, AI tools don't make much of a difference.

NaN years ago

undefined

eitally 14 days ago

It's going to be very interesting to see how things evolve in enterprise IT, especially but not exclusively in regulated industries. As more SaaS services are at least partly vibe coded, how are CIOs going to understand and mitigate risk? As more internal developers are using LLM-powered coding interfaces and become less clear on exactly how their resulting code works, how will that codebase be maintained and incrementally updated with new features, especially in solo dev teams (which is common)?

I easily see a huge future for agentic assistance in the enterprise, but I struggle mightily to see how many IT leaders would accept the output code of something like a menugen app as production-viable.

Additionally, if you're licensing code from external vendors who've built their own products at least partly through LLM-driven superpowers, how do you have faith that they know how things work and won't inadvertently break something they don't know how to fix? This goes for niche tools (like Clerk, or Polar.sh or similar) as much as for big heavy things (like a CRM or ERP).

I was on the CEO track about ten years ago and left it for a new career in big tech, and I don't envy the folks currently trying to figure out the future of safe, secure IT in the enterprise.

charlie0 14 days ago

It will succeed due to the same reason other sloppy strategies succeed, it has large short term gains and moves risk into the nebulous future. Management LOVES these types of things.

r2b2 14 days ago

I've found that as LLMs improve, some of their bugs become increasingly slippery - I think of it as the uncanny valley of code.

Put another way, when I cause bugs, they are often glaring (more typos, fewer logic mistakes). Plus, as the author it's often straightforward to debug since you already have a deep sense for how the code works - you lived through it.

So far, using LLMs has downgraded my productivity. The bugs LLMs introduce are often subtle logical errors, yet "working" code. These errors are especially hard to debug when you didn't write the code yourself — now you have to learn the code as if you wrote it anyway.

I also find it more stressful deploying LLM code. I know in my bones how carefully I write code, due to a decade of roughly "one non critical bug per 10k lines" that keeps me asleep at night. The quality of LLM code can be quite chaotic.

That said, I'm not holding my breath. I expect this to all flip someday, with an LLM becoming a better and more stable coder than I am, so I guess I will keep working with them to make sure I'm proficient when that day comes.

thegeomaster 14 days ago

I have been using LLMs for coding a lot during the past year, and I've been writing down my observations by task. I have a lot of tasks where my first entry is thoroughly impressed by how e.g. Claude helped me with a task, and then the second entry is a few days after when I'm thoroughly irritated by chasing down subtle and just _strange_ bugs it introduced along the way. As a rule, these are incredibly hard to find and tedious to debug, because they lurk in the weirdest places, and the root cause is usually some weird confabulation that a human brain would never concoct.

throw234234234 13 days ago

Saw a recent talk where someone described AI as making errors, but not errors that a human would naturally make and are usually "plausible but wrong" answers. i.e. the errors that these AI's make are of a different nature than what a human would do. This is the danger - that reviews now are harder; I can't trust it as much as a person coding at present. The agent tools are a little better (Claude Code, Aider, etc) in that they can at least take build and test output but even then I've noticed it does things that are wrong but are "plausible and build fine".

I've noticed it in my day-to-day: an AI PR review is different than if I get the PR from a co-worker with different kinds of problems. Unfortunately the AI issues seem to be more of the subtle kind - the things if I'm not diligent could sneak into production code. It means reviews are more important, and I can't rely on previous experience of a co-worker and the typical quality of their PR's - every new PR is a different worker effectively.

DanHulton 14 days ago

I'm curious where that expectation of the flip comes from? Your experience (and mine, frankly) would seem to indicate the opposite, so from whence comes this certainty that one day it'll change entirely and become reliable instead?

I ask (and I'll keep asking) because it really seems like the prevailing narrative is that these tools have improved substantially in a short period of time, and that is seemingly enough justification to claim that they will continue to improve until perfection because...? waves hands vaguely

Nobody ever seems to have any good justification for how we're going to overcome the fundamental issues with this tech, just a belief that comes from SOMEWHERE that it'll happen anyway, and I'm very curious to drill down into that belief and see if it comes from somewhere concrete or it's just something that gets said enough that it "becomes true", regardless of reality.

dapperdrake 14 days ago

Just like when all regulated industries started only using decision trees and ordinary least-squares regression instead of any other models.

gosub100 14 days ago

> how many IT leaders would accept the output code of something like a menugen app as production-viable.

probably all of the ones at microsoft

imiric 14 days ago

The slide at 13m claims that LLMs flip the script on technology diffusion and give power to the people. Nothing could be further from the truth.

Large corporations, which have become governments in all but name, are the only ones with the capability to create ML models of any real value. They're the only ones with access to vast amounts of information and resources to train the models. They introduce biases into the models, whether deliberately or not, that reinforces their own agenda. This means that the models will either avoid or promote certain topics. It doesn't take a genius to imagine what will happen when the advertising industry inevitably extends its reach into AI companies, if it hasn't already.

Even open weights models which technically users can self-host are opaque blobs of data that only large companies can create, and have the same biases. Even most truly open source models are useless since no individual has access to the same large datasets that corporations use for training.

So, no, LLMs are the same as any other technology, and actually make governments and corporations even more powerful than anything that came before. The users benefit tangentially, if at all, but will mostly be exploited as usual. Though it's unsurprising that someone deeply embedded in the AI industry would claim otherwise.

moffkalast 14 days ago

Well there are cases like OLMo where the process, dataset, and model are all open source. As expected though, it doesn't really compare well to the worst closed model since the dataset can't contain vast amounts of stolen copyrighted data that noticeably improves the model. Llama is not good because Meta knows what they're doing, it's good because it was pretrained on the entirety of Anna's Archive and every pirated ebook they could get their hands on. Same goes for Elevenlabs and pirated audiobooks.

Lack of compute on the Ai2's side also means the context OLMo is trained for is miniscule, the other thing that you need to throw brazillions of dollars at to make model that's maybe useful in the end if you're very lucky. Training needs high GPU interconnect bandwidth, it can't be done in distributed horde in any meaningful way even if people wanted to.

The only ones who have the power now are the Chinese, since they can easily ignore copyright for datasets, patents for compute, and have infinite state funding.

iLoveOncall 14 days ago

He sounds like Terrence Howard with his nonsense.

whilenot-dev 13 days ago

I watched Karpathy's Intro to Large Language Models[0] not so long ago and must say that I'm a bit confused by this presentation, and it's a bit unclear to me what it adds.

1,5 years ago he saw all the tool uses in agent systems as the future of LLMs, which seemed reasonable to me. There was (and maybe still is) potential for a lot of business cases to be explored, but every system is defined by its boundaries nonetheless. We still don't know all the challenges we face at that boundaries, whether these could be modelled into a virtual space, handled by software, and therefor also potentially AI and businesses.

Now it all just seems to be analogies and what role LLMs could play in our modern landscape. We should treat LLMs as encapsulated systems of their own ...but sometimes an LLM becomes the operating system, sometimes it's the CPU, sometimes it's the mainframe from the 60s with time-sharing, a big fab complex, or even outright electricity itself?

He's showing an iOS app, which seems to be, sorry for the dismissive tone, an example for a better looking counter. This demo app was in a presentable state for a demo after a day, and it took him a week to implement Googles OAuth2 stuff. Is that somehow exciting? What was that?

The only way I could interpret this is that it just shows a big divide we're currently in. LLMs are a final API product for some, but an unoptimized generative software-model with sophisticated-but-opaque algorithms for others. Both are utterly in need for real world use cases - the product side for the fresh training data, and the business side for insights, integrations and shareholder value.

Am I all of a sudden the one lacking imagination? Is he just slurping the CEO cool aid and still has his investments in OpenAI? Can we at least agree that we're still dealing with software here?

[0]: https://www.youtube.com/watch?v=zjkBMFhNj_g

bwfan123 13 days ago

> Am I all of a sudden the one lacking imagination?

No, The reality of what these tools can do is sinking in.. The rubber is meeting the road and I can hear some screaching.

The boosters are in 5 stages of grief coming to terms with what was once AGI and is now a mere co-pilot, while the haters are coming to terms with the fact that LLMs can actually be useful in a variety of usecases.

acedTrex 13 days ago

I actually quite agree with this, there is some reckoning on both sides happening. It's quite entertaining to watch, a bit painful as well of course as someone who is on the "they are useless" side and is noticing some very clear usecases where a value add is present.

NaN years ago

undefined

NaN years ago

undefined

anothermathbozo 13 days ago

> The reality of what these tools can do is sinking in

It feels premature to make determinations about how far this emergent technology can be pushed.

NaN years ago

undefined

hn_throwaway_99 13 days ago

> The boosters are in 5 stages of grief coming to terms with what was once AGI and is now a mere co-pilot, while the haters are coming to terms with the fact that LLMs can actually be useful in a variety of usecases.

I couldn't agree with this more. I often get frustrated because I feel like the loudest voices in the room are so laughably extreme. One on side you have the "AGI cultists", and on the other you have the "But the hallucinations!!!" people. I've personally been pretty amazed by the state of AI (nearly all of this stuff was the domain of Star Trek just a few years ago), and I get tons of value out of many of these tools, but at the same time I hit tons of limitations and I worry about the long-term effect on society (basically, I think this "ask AI first" approach, especially among young people, will kinda turn us all into idiots, similar to the way Google Maps made it hard for most of us to remember the simple directions). I also can't help but roll my eyes when I hear all the leaders of these AI companies going on about how AI will make a "white collar bloodbath" - there is some nuggets of truth in that, but these folks are just using scare tactics to hype their oversold products.

pera 13 days ago

Exactly! What skeptics don't get is that AGI is already here and we are now starting a new age of infinite prosperity, it's just that exponential growth looks flat at first, obviously...

Quantum computers and fusion energy are basically solved problems now. Accelerate!

NaN years ago

undefined

westoncb 13 days ago

> and must say that I'm a bit confused by this presentation, and it's a bit unclear to me what it adds.

I think the disconnect might come from the fact that Karpathy is speaking as someone who's day-to-day computing work has already been radically transformed by this technology (and he interacts with a ton of other people for whom this is the case), so he's not trying to sell the possibility of it: that would be like trying to sell the possibility of an airplane for someone who's already just cruising around in one every day. Instead the mode of the presentation is more: well, here we are at the dawn of a new era of computing, it really happened. Now how can we relate this to the history of computing to anticipate where we're headed next?

> ...but sometimes an LLM becomes the operating system, sometimes it's the CPU, sometimes it's the mainframe from the 60s with time-sharing, a big fab complex, or even outright electricity itself?

He uses these analogies in clear and distinct ways to characterize separate facets of the technology. If you were unclear on the meanings of the separate analogies it seems like the talk may offer some value for you after all but you may be missing some prerequisites.

> This demo app was in a presentable state for a demo after a day, and it took him a week to implement Googles OAuth2 stuff. Is that somehow exciting? What was that?

The point here was that he'd built the core of the app within a day without knowing the Swift language or ios app dev ecosystem by leveraging LLMs, but that part of the process remains old-fashioned and blocks people from leveraging LLMs as they can when writing code—and he goes on to show concretely how this could be improved.

Workaccount2 13 days ago

The fundamental mistake I see is people applying LLMs to the current paradigm of software; enormous hulking codebases made to have as many features as possible to appeal to as many users as possible.

LLMs are excellent at helping non-programmers write narrow use case, bespoke programs. LLMs don't need to be able to one-shot excel.exe or Plantio.apk so that Christine can easily track when she watered and fed her plants nutrients.

The change that LLMs will bring to computing is much deeper than Garden Software trying to slot in some LLM workers to work on their sprawling feature-pack Plantio SaaS.

I can tell you first hand I have already done this numerous times as a non-programmer working a non-tech job.

skydhash 13 days ago

The thing is that there’s a need to integrate all these little tools because the problems they solve is part of the same domain. And that’s where problems lie. Something like Excel have an advantage as being a common platform for both data and procedures. Unix adopted text and pipes for integration.

demosthanos 13 days ago

What you're missing is the audience.

This talk is different from his others because it's directed at aspiring startup founders. It's about how we conceptualize the place of an LLM in a new business. It's designed to provide a series of analogies any one of which which may or may not help a given startup founder to break out of the tired, binary talking points they've absorbed from the internet ("AI all the things" vs "AI is terrible") in favor of a more nuanced perspective of the role of AI in their plans. It's soft and squishy rhetoric because it's not about engineering, it's about business and strategy.

I honestly left impressed that Karpathy has the dynamic range necessary to speak to both engineers and business people, but it also makes sense that a lot of engineers would come out of this very confused at what he's on about.

whilenot-dev 13 days ago

I get that, motivating young founders is difficult, and I think he has a charming geeky way of provoking some thoughts. But on the other hand: Why mainframes with time-sharing from the 60s? Why operating systems? LLMs to tell you how to boil an egg, seriously?

Putting my engineering hat on, I understand his idea of the "autonomy slider" as lazy workaround for a software implementation that deals with one system boundary. He should aspire people there to seek out for unknown boundaries, not provide implementation details to existing boundaries. His MenuGen app would probably be better off using a web image search instead of LLM image generation. Enhancing deployment pipelines with LLM setups is something for the last generation of DevOps companies, not the next one.

Please mention just once the value proposition and responsibilities when handling large quantities of valuable data - LLMs wouldn't exist without them! What makes quality data for an LLM, or personal data?

nodesocket 14 days ago

llms.txt makes a lot of sense, especially for LLMs to interact with http APIs autonomously.

Seems like you could set a LLM loose and like the Google Bot have it start converting all html pages into llms.txt. Man, the future is crazy.

nothrabannosir 14 days ago

Couldn’t believe my eyes. The www is truly bankrupt. If anyone has a browser plugin which automatically redirects to llms.txt sign me up.

Website too confusing for humans? Add more design, modals, newsletter pop ups, cookie banners, ads, …

Website too confusing for LLMs? Add an accessible, clean, ad-free, concise, high entropy, plain text summary of your website. Make sure to hide it from the humans!

PS: it should be /.well-known/llms.txt but that feels futile at this point..

PPS: I enjoyed the talk, thanks.

andrethegiant 14 days ago

> If anyone has a browser plugin which automatically redirects to llms.txt sign me up.

Not a browser plugin, but you can prefix URLs with `pure.md/` to get the pure markdown of that page. It's not quite a 1:1 to llms.txt as it doesn't explain the entire domain, but works well for one-off pages. [disclaimer: I'm the maintainer]

NaN years ago

undefined

jph00 14 days ago

The next version of the llms.txt proposal will allow an llms.txt file to be added at any level of a path, which isn't compatible with /.well-known.

(I'm the creator of the llms.txt proposal.)

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

alightsoul 14 days ago

The web started dying with mobile social media apps, in which hyperlinks are a poor UX choice. Then again with SEO banning outlinks. Now this. The web of interconnected pages that was the World Wide Web is dead. Not on social media? No one sees you. Run a website? more bots than humans. Unless you sell something on the side with the website it's not profitable. Hyperlinking to other websites is dead.

Gen Alpha doesn't know what a web page is and if they do, it's for stuff like neocities aka as a curiosity or art form only. Not as a source of information anymore. I don't blame them. Apps (social media apps) have less friction than web sites but have a higher barrier for people to create. We are going back to pre World Wide Web days in a way, kind of like Bulletin Board Systems on dial up without hyperlinking, and centralized (social media) Some countries mostly ones with few technical people llike the ones in Central America have moved away from the web almost entirely and into social media like Instagram.

Due to the death of the web, google search and friends now rely mostly on matching queries with titles now so just like before the internet you have to know people to learn new stuff or wait for an algorithm to show it to you or someone to comment it online or forcefully enroll in a university. Maybe that's why search results have declined and poeple search using ChatGPT or maybe perplexity. Scholarly search engines are a bit better but frankly irrelevant for most poeple.

Now I understand why Google established their own DNS server at 8.8.8.8. If you have a directory of all domains on DNS, you can still index sites without hyperlinks between them, even if the web dies. They saw it coming.

NaN years ago

undefined

practal 14 days ago

If you have different representations of the same thing (llms.txt / HTML), how do you know it is actually equivalent to each other? I am wondering if there are scenarios where webpage publishers would be interested in gaming this.

andrethegiant 14 days ago

NaN years ago

undefined

jph00 14 days ago

That's not what llms.txt is. You can just use a regular markdown URL or similar for that.

llms.txt is a description for an LLM of how to find the information on your site needed for an LLM to use your product or service effectively.

llms-txt 14 days ago

[dead]

blixt 14 days ago

If we extrapolate these points about building tools for AI and letting the AI turn prompts into code I can’t help but reach the conclusion that future programming languages and their runtimes will be heavily influenced by the strengths and weaknesses of LLMs.

What would the code of an application look like if it was optimized to be efficiently used by LLMs and not humans?

* While LLMs do heavily tend towards expecting the same inputs/outputs as humans because of the training data I don’t think this would inhibit co-evolution of novel representations of software.

mythrwy 14 days ago

It does seem a bit silly long term to have something like Python which was developed as a human friendly language written by LLMs.

If AI is going to write all the code going forward, we can probably dispense with the user friendly part and just make everything efficient as possible for machines.

doug_durham 14 days ago

I don't agree. Important code will need to be audited. I think the language of the future will be easy to read by human reviewers but deterministic. It won't be a human language. Instead it will be computer language with horrible ergonomics. I think Python or straight up Java would be a good start. Things like templates wouldn't be necessary since you could express that deterministically in a higher level syntax (e.g. A list of elements that can accept any type). It would be an interesting exercise.

mostlysimilar 14 days ago

If humans don't understand it to write the data the LLM is trained on, how will the LLM be able to learn it?

thierrydamiba 14 days ago

Is a world driven by the strengths and weaknesses of programming languages better than the one driven by the strengths and weaknesses of LLMs?

ivape 13 days ago

Better to think of it as a world driven by the strengths and weaknesses of people. Is the world better if more people can express themselves via software? Yes.

I don’t believe in coincidences. I don’t think the universe provided AI by accident. I believe it showed up just at the moment where the universe wants to make it clear - your little society of work and status and money can go straight to living hell. And that’s where it’s going, the developer was never supposed to be a rockstar, they were always meant to be creatives who do it because they like it. Fuck this job bullshit, those days are over. You will program the same way you play video games, it’s never to be work again (it’s simply too creative).

Will the universe make it so a bunch of 12 year olds dictate software in natural language in a Roblox like environment that rivals the horeshit society sold for billions just a decade ago? Yes, and thank god. It’s been a wild ride, thank you god for ending it (like he did with nuclear bombs after ww2, our little universe of war shrunk due to that).

Anyways, always pay attention to the little details, it’s never a coincidence. The universe doesn’t just sit there and watch our fiasco believe it or not, it gets involved.

s_ting765 13 days ago

Given the plethora of programming languages that exist today, I'm not worried at all about AI taking over SWE jobs.

maitredusoi 14 days ago

[dead]

internet_rand0 14 days ago

[dead]

old_man_cato 13 days ago

The image of a bunch of children in a room gleefully playing with their computers is horror movie type stuff, but because it's in a white room with plants and not their parent's basement with the lights off, it's somehow a wonderful future.

Karpathy and his peer group are some of the most elitist and anti social people who have ever lived. I wonder how history will remember them.

whatarethembits 13 days ago

Its early days. Agree with your point that the "vision" of the future laid out by tech people doesn't have much of a chance of becoming (accepted) reality, because its necessarily a reflection of their own inner world, largely devoid of importance and interactions with other people. Prime example, see metaverse. Most of us don't want to replace the real world with a (crappy) digital one; the sooner we build things that respects that fundamental value, the sooner we can build things that actually improves our lives.

8note 13 days ago

did you not have the computer room open to flash games and the like over lunch time? competitive 4 player bmtron was a blast way back whenhttps://www.games1729.com/archive/

old_man_cato 13 days ago

I did. I also had basically unlimited access to pornography and I saw more than one video of someone having their head severed off. But yeah, I played a lot of computer games. That was fun.

mirsadm 13 days ago

I thought that video was generated. Everything about it seemed off

bicepjai 11 days ago

A lot of people reach for the “electricity” analogy whenever a tech wave crests—crypto, cloud, and now LLMs. With crypto, the comparison always felt forced: the utility was niche, and the energy cost was hard to justify. LLMs, on the other hand, are genuinely useful, but is the electricity comparison still valid ?

0xjunhao 11 days ago

Before I became a software engineer, I was a computational physicist. My days back then were pretty much tweaking some parameters, running a job, then reading papers and checking back after a few minutes or hours. Increasingly, I’m starting to think my days as a software engineer will be pretty similar.

ankurdhama 13 days ago

Where are the debugging tools for the so called "Software 3.0" ?

autobodie 13 days ago

If the prompt is good, the LLM will tell you when it's wrong, but you can use production testing if necessary like Tesla.

raffael_de 13 days ago

I'm a little surprised at how negative he is towards textual interfaces and text for representing information.

manyaoman 13 days ago

I didn't get the impression that he's against text per se, just that LLMs should use a format that's most concise for humans in the given scenario. Example from the video: showing the (textual) diff between old and new versions of text/code, rather than just the new version. Or converting a text-only restaurant menu to photos+text.

Waterluvian 13 days ago

This got me thinking about something…

Isn’t an LLM basically a program that is impossible to virus scan and therefore can never be safely given access to any capable APIs?

For example: I’m a nice guy and spend billions on training LLMs. They’re amazing and free and I hand out the actual models for you all to use however you want. But I’ve trained it very heavily on a specific phrase or UUID or some other activation key being a signal to <do bad things, especially if it has console and maybe internet access>. And one day I can just leak that key into the world. Maybe it’s in spam, or on social media, etc.

How does the community detect that this exists in the model? Ie. How does the community virus scan the LLM for this behaviour?

robertk 13 days ago

You may be interested in: https://www.anthropic.com/research/sleeper-agents-training-d... https://arxiv.org/abs/2404.13660

Waterluvian 13 days ago

Yes these look perfect! Thank you.

orbital-decay 13 days ago

This is what mechanistic interpretability studies are trying to achieve, and it's not yet realistically possible for a general case.

avarun 13 days ago

Similarly to how you can never guarantee that one of your trusted employees won’t be made a foreign asset.

theGnuMe 13 days ago

This is a good insight. There’s also a similar insight about compilers back in the days before AV.. we will have AV LLMs etc… basically reinvent everything for the new stack.

jedimastert 13 days ago

I was just talking to somebody at work about a "Trusting Trust" style attack from LLMs. I will remain deeply suspicious of them

autobodie 13 days ago

Profit over security, outsource liability

LZ_Khan 13 days ago

I do feel like large scale LLM vulnerabilities will be the real Y2K

tinyhouse 14 days ago

After Cursor is sold for $3B, they should transfer Karpathy 20%. (it also went viral before thanks to him tweeting about it)

Great talk like always. I actually disagree on a few things with him. When he said "why would you go to ChatGPT and copy / paste, it makes much more sense to use a GUI that is integrated to your code such as Cursor".

Cursor and the like take a lot of the control from the user. If you optimize for speed then use Cursor. But if you optimize for balance of speed, control, and correctness, then using Cursor might not be the best solution, esp if you're not an expert of how to use it.

It seems that Karpathy is mainly writing small apps these days, he's not working on large production systems where you cannot vibe code your way through (not yet at least)

mentalgear 14 days ago

Meanwhile, I asked this morning Claude 4 to write a simple EXIF normalizer. After two rounds of prompting it to double-check its code, I still had to point out that it makes no sense to load the entire image for re-orientating if the EXIF orientation is fine in the first place.

Vibe vs reality, and anyone actually working in the space daily can attest how brittle these systems are.

Maybe this changes in SWE with more automated tests in verifiable simulators, but the real world is far to complex to simulate in its vastness.

diggan 14 days ago

> Meanwhile

What do you mean "meanwhile", that's exactly (among other things) the kind of stuff he's talking about? The various frictions and how you need to approach it

> anyone actually working in the space

Is this trying to say that Karpathy doesn't "actually work" with LLMs or in the ML space?

I feel like your whole comment is just reacting to the title of the YouTube video, rather than actually thinking and reflecting on the content itself.

demaga 14 days ago

I'm pretty sure "actually work" part refers to SWE space rather than LLM/ML space

Seanambers 14 days ago

Seems to me that this is just another level of throwing compute at the problem.

Same way programs was way more efficient before and now they are "bloated" with packages, abstractions, slow implementations of algos and scaffolding.

The concept of what is good software development might be changing as well.

LLMs might not write the best code, but they sure can write a lot of it.

ApeWithCompiler 14 days ago

A manager in our company introduced Gemini as a chat bot coupled to our documentation.

> It failed to write out our company name.The rest was flawed with hallucinations also, hardly worth to mention.

I wish this is a rage bait towards others, but what should me feelings be? After all this is the tool thats sold to me, I am expected to work with.

gorbachev 14 days ago

We had exactly the opposite experience. CoPilot was able to answer questions accurately and reformatted the existing documentation to fit the context of users' questions, which made the information much easier to understand.

Code examples, which we offer as sort of reference implementations, were also adopted to fit the specific questions without much issues. Granted these aren't whole applications, but 10 - 25 line examples of doing API setup / calls.

We didn't, of course, just send users' questions directly to CoPilot. Instead there's a bit of prompt magic behind the scenes that tweaks the context so that CoPilot can produce better quality results.

ramon156 14 days ago

The real question is how long it'll take until they're not brittle

kubb 14 days ago

Or will they ever be reliable. Your question is already making an assumption.

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

yahoozoo 14 days ago

“Treat it like a junior developer” … 5 years later … “Treat it like a junior developer”

NaN years ago

undefined

NaN years ago

undefined

guappa 14 days ago

∞

hombre_fatal 14 days ago

On the other hand, posts like this are like watching someone writing ask jeeves search queries into google 20 years ago and then gesturing how google sucks while everyone else in the room has figured out how to be productive with it and cringes at his "boomer" queries.

If you're still struggling to make LLMs useful for you by now, you should probably ask someone. Don't let other noobs on HN +1'ing you hold you back.

mirrorlake 14 days ago

Perhaps consider making some tutorials, then, and share your wealth of knowledge rather than calling people stupid.

NaN years ago

undefined

coreyh14444 14 days ago

https://theeducationist.info/everything-amazing-nobody-happy...

NaN years ago

undefined

sensanaty 14 days ago

There's also those instances where Microsoft unleashed Copilot on the .NET repo, and it resulted in the most hilariously terrible PRs that required the maintainers to basically tell Copilot every single step it should take to fix the issue. They were basically writing the PRs themselves at that point, except doing it through an intermediary that was much dumber, slower and less practical than them.

And don't get me started on my own experiences with these things, and no, I'm not a luddite, I've tried my damndest and have followed all the cutting-edge advice you see posted on HN and elsewhere.

Time and time again, the reality of these tools falls flat on their face while people like Andrej hype things up as if we're 5 minutes away from having Claude become Skynet or whatever, or as he puts it, before we enter the world of "Software 3.0" (coincidentally totally unrelated to Web 3.0 and the grift we had to endure there, I'm sure).

To intercept the common arguments,

- no I'm not saying LLMs are useless or have no usecases

- yes there's a possibility if you extrapolate by current trends (https://xkcd.com/605/) that they indeed will be Skynet

- yes I've tried the latest and greatest model released 7 minutes ago to the best of my ability

- yes I've tried giving it prompts so detailed a literal infant could follow along and accomplish the task

- yes I've fiddled with providing it more/less context

- yes I've tried keeping it to a single chat rather than multiple chats, as well as vice versa

- yes I've tried Claude Code, Gemini Pro 2.5 With Deep Research, Roocode, Cursor, Junie, etc.

- yes I've tried having 50 different "agents" running and only choosing the best output form the lot.

I'm sure there's a new gotcha being written up as we speak, probably something along the lines of "Well for me it doubled my productivity!" and that's great, I'm genuinely happy for you if that's the case, but for me and my team who have been trying diligently to use these tools for anything that wasn't a microscopic toy project, it has fallen apart time and time again.

The idea of an application UI or god forbid an entire fucking Operating System being run via these bullshit generators is just laughable to me, it's like I'm living on a different planet.

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

darqis 14 days ago

when I started coding at the age of 11 in machine code and assembly on the C64, the dream was to create software that creates software. Nowadays it's almost reality, almost because the devil is always in the details. When you're used to write code, writing code is relatively fast. You need this knowledge to debug issues with generated code. However you're now telling AI to fix the bugs in the generated code. I see it kind of like machine code becomes overlaid with asm which becomes overlaid with C or whatever higher level language, which then uses dogma/methodology like MVC and such and on top of that there's now the AI input and generation layer. But it's not widely available. Affording more than 1 computer is a luxury. Many households are even struggling to get by. When you see those what 5 7 Mac Minis, which normal average Joe can afford that or does even have to knowledge to construct an LLM at home? I don't. This is a toy for rich people. Just like with public clouds like AWS, GCP I left out, because the cost is too high and running my own is also too expensive and there are cheaper alternatives that not only cost less but also have way less overhead.

What would be interesting to see is what those kids produced with their vibe coding.

kordlessagain 14 days ago

Kids? Think about all the domain experts, entrepreneurs, researchers, designers, and creative people who have incredible ideas but have been locked out of software development because they couldn't invest 5-10 years learning to code.

A 50-year-old doctor who wants to build a specialized medical tool, a teacher who sees exactly what educational software should look like, a small business owner who knows their industry's pain points better than any developer. These people have been sitting on the sidelines because the barrier to entry was so high.

The "vibe coding" revolution isn't really about kids (though that's cute) - it's about unleashing all the pent-up innovation from people who understand problems deeply but couldn't translate that understanding into software.

It's like the web democratized publishing, or smartphones democratized photography. Suddenly expertise in the domain matters more than expertise in the tools.

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

diggan 14 days ago

> those kids produced with their vibe coding

No one, including Karpathy in this video, is advocating for "vibe coding". If nothing more, LLMs paired with configurable tool-usage, is basically a highly advanced and contextual search engine you can ask questions. Are you not using a search engine today?

Even without LLMs being able to produce code or act as agents they'd be useful, because of that.

But it sucks we cannot run competitive models locally, I agree, it is somewhat of a "rich people" tool today. Going by the talk and theme, I'd agree it's a phase, like computing itself had phases. But you're gonna have to actually watch and listen to the talk itself, right now you're basically agreeing with the video yet wrote your comment like you disagree.

infecto 14 days ago

This is most definitely not toys for rich people. Now perhaps depending on your country it may be considered rich but I would comfortably say that for most of the developed world, the costs for these tools are absolutely attainable, there is a reason ChatGPT has such a large subscriber base.

Also the disconnect for me here is I think back on the cost of electronics, prices for the level of compute have generally gone down significantly over time. The c64 launched around the $5-600 price level, not adjusted for inflation. You can go and buy a Mac mini for that price today.

NaN years ago

undefined

NaN years ago

undefined

kapildev 13 days ago

>What would be interesting to see is what those kids produced with their vibe coding.

I think you are referring to what those kids in the vibe coding event produced. Wasn't their output available in the video itself?

dist-epoch 14 days ago

> This is a toy for rich people

GitHub copilot has a free tier.

Google gives you thousands of free LLM API calls per day.

There are other free providers too.

NaN years ago

undefined

lubujackson 13 days ago

Generally, people behind big revolutionary tech are the worst suited for understanding how it will do "in the wild". Forest for the trees and all that.

Some good nuggets in this talk, specifically his concept that Software 1.0, 2.0 and 3.0 will all persist and all have unique use cases. I definitely agree with that. I disagree with his belief that "anyone can vibe code" mindset - this works to a certain level of fidelity ("make an asteroids clone") but what he overlooks is his ability, honed over many years, to precisely document requirements that will translate directly to code that works in an expected way. If you can't write up a Jira epic that covers all bases of a project, you probably can't vibe code something beyond a toy project (or an obvious clone). LLM code falls apart under its own weight without a solid structure, and I don't think that will ever fundamentally change.

Where we are going next, and a lot of effort is being put behind, is figuring out exactly how to "lengthen the leash" of AI through smart framing, careful context manipulation and structured requests. We obviously can have anyone vibe code a lot further if we abstract different elements into known areas and simply allow LLMs to stitch things together. This would allow much larger projects with a much higher success rate. In other words, I expect an AI Zapier/Yahoo Pipes evolution.

Lastly, I think his concept of only having AI pushing "under 1000 line PRs" that he carefully reviews is more short-sighted. We are very, very early in learning how to control these big stupid brains. Incrementally, we will define sub-tasks that the AI can take over completely without anyone ever having to look at the code, because the output will always be within an accepted and tested range. The revolution will be at the middleware level.

superconduct123 13 days ago

Where was he was saying you could vibe code beyond a simple app?

He even said it could be a gateway to actual programming

jmsdnns 13 days ago

There is another angle to this too.

Prior to LLMs, it was amusing to consider how ML folks and software folks would talk passed each other. It was amusing because both sides were great at what they do, neither side understood the other side, and they had to work together anyway.

After LLMs, we now have lots of ML folks talking about the future of software, so ething previously established to be so outside their expertise that communication with software engineers was an amusing challenge.

So I must ask, are ML folks actually qualified to know the future of software engineering? Shouldnt we be listening to software engineers instead?

NaN years ago

undefined

NaN years ago

undefined

AlexCoventry 13 days ago

I've seen evidence of "anyone can vibe code", but at this stage the result tends to be a 5,000-line application intricately entangled with 500,000 lines of irrelevant slop. Still, the wonder is that the bear can dance at all. That's a new thing under the sun.

NaN years ago

undefined

fergie 14 days ago

There were some cool ideas- I particularly liked "psychology of AI"

Overall though I really feel like he is selling the idea that we are going to have to pay large corporations to be able to write code. Which is... terrifying.

Also, as a lazy developer who is always trying to make AI do my job for me, it still kind of sucks, and its not clear that it will make my life easier any time soon.

teekert 14 days ago

He says that now we are in the mainframe phase. We will hit the personal computing phase hopefully soon. He says llama (and DeepSeek?) are like Linux in a way, OpenAI and Claude are like Windows and MacOS.

So, No, he’s actually saying it may be everywhere for cheap soon.

I find the talk to be refreshingly intellectually honest and unbiased. Like the opposite of a cringey LinkedIn post on AI.

NaN years ago

undefined

guappa 14 days ago

I think it used to be like that before the GNU people made gcc, completely destroying the market of compilers.

> Also, as a lazy developer who is always trying to make AI do my job for me, it still kind of sucks, and its not clear that it will make my life easier any time soon.

Every time I have to write a simple self contained couple of functions I try… and it gets it completely wrong.

It's easier to just write it myself rather than to iterate 50 times and hope it will work, considering iterations are also very slow.

NaN years ago

undefined

geraneum 14 days ago

On a tangent, I find the analogies interesting as well. However, while Karpathy is an expert in Computer Science, NLP and machine vision, his understanding of how human psychology and brain work is as good as you an I (non-experts). So I take some of those comparisons as a lay person’s feelings about the subject. Still, they are fun to listen to.

j45 13 days ago

It's interesting how researchers are ahead on some insights and introducing them, and it feels like some are new to them but it might already exist and they're helping present them to the world.

A positive video all around, have got to learn a lot from Andrej's Youtube account.

LLMs are really strange, I don't know if I've seen a technology where the technology class that applies it (or can verify applicability) has been so separate or unengaged compared to the non-technical people looking to solve problems.

pera 14 days ago

Is it possible to vibe code NFT smart contracts with Software 3.0?

johnwheeler 13 days ago

https://github.com/screencam/typescript-mcp-server

I've been working on this project. I built this in about two days, using it to build itself at the tail end of the effort. It's not perfect, but I see the promise in it. It stops the thrashing the LLMs can do when they're looking for types or trying to resolve anything like that.

diggan 13 days ago

> Traditional: Read 5000 lines → Find method → Replace → Write 5000 lines

What of today's agents work like this? None of the ones I've tried would do something like that, but instead would grep/search the file, then do a smaller edit (different tools do those in different ways).

Overall, it does feel like a strawman argument against "Traditional" when almost none of the tooling actually works like that.

NaN years ago

undefined

matiasmolinas 14 days ago

https://github.com/EvolvingAgentsLabs/llmunix

An experiment to explore Kaparthy ideas

bawana 14 days ago

how do i install this thing?

NaN years ago

undefined

MoonGhost 11 days ago

He didn't mention multi-modal models. Probably because they don't fit in the oversimplified picture.

nickalex 10 days ago

I believe AI relies too heavily on logic—and that, surprisingly, can be a disadvantage. Logical solutions don’t always work in real-world situations, because logic isn't the same as creativity. And creativity is essential.

wiremine 13 days ago

I spent a lot of time thinking about this recently. Ultimately, English is not a clean, deterministic abstraction layer. This isn't to say that LLMs aren't useful, and can create some great efficiencies.

npollock 13 days ago

no, but a subset of English could be

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

yahoozoo 14 days ago

I was trying to do some reverse engineering with Claude using an MCP server I wrote for a game trainer program that supports Python scripts. The context window gets filled up _so_ fast. I think my server is returning too many addresses (hex) when Claude searches for values in memory, but it’s annoying. These things are so flaky.

diggan 13 days ago

Yeah, usually I'd steer my agents to never use the output directly from any command, and instead redirect it to a logfile, then force it to search/grep stuff directly from the log-file instead of just getting all the outputs at all times. Seems to work OK.

meerab 12 days ago

See complete transcript of Andrej Karpathy's video

https://videotobe.com/play/youtube/LCEmiRjPEtQ

sockboy 13 days ago

Definitely hit this wall too. The backend just for API proxy feels like a detour when all you want is to ship a quick prototype. Would love to see more tools that make this seamless, especially for solo builders.

fnord77 14 days ago

Him claiming govts don't use AI or are behind the curve is not accurate.

Modern military drones are very much AI agents

manyaoman 13 days ago

Governments obviously lead in military tech, but do you think they have access to better AI (in general) than consumers? Unless they do, I think it's fair to say that governments are behind the curve, since consumers tend to adopt things more quickly.

NaN years ago

undefined

NaN years ago

undefined

password4321 14 days ago

[dead]

romain_batlle 14 days ago

Can't believe they wanted to postpone this video by a few weeks

dang 13 days ago

No one wanted to! I think we might have bitten off more than we could chew in terms of video production. There is a lot of content to publish.

Once it was clear how high the demand was for this talk, the team adapted quickly.

That's how it goes sometimes! Future iterations will be different.

poorcedural 14 days ago

Software 3.0 is where Engineers only create the kernel or seed of an idea. Then all users are developers creating their own branch using the feedback loop of their own behavior.

ldenoue 13 days ago

Full playable transcript https://www.appblit.com/scribe?v=LCEmiRjPEtQ

swyx 13 days ago

slides: https://docs.google.com/presentation/d/1sZqMAoIJDxz79cbC5ap5...

longhaul 11 days ago

QA is what SEs will be doing - testing , followed by feedback to LLMs. Why can’t just product folks do this eventually w/o SEs?

klysm 11 days ago

Product folks don’t know what they want a lot of the time and don’t know what’s possible

himanshuy 13 days ago

why there are so many bots posting comments?

belter 14 days ago

Painful to watch. The new tech generation deserves better than hyped presentations from tech evangelists.

This reminds me of the Three Amigos and Grady Booch evangelizing the future of software while ignoring the terrible output from Rational Software and the Unified Process.

At least we got acknowledgment that self-driving remains unsolved: https://youtu.be/LCEmiRjPEtQ?t=1622

And Waymo still requires extensive human intervention. Given Tesla's robotaxi timeline, this should crash their stock valuation...but likely won't.

You can't discuss "vibe coding" without addressing security implications of the produced artifacts, or the fact that you're building on potentially stolen code, books, and copyrighted training data.

And what exactly is Software 3.0? It was mentioned early then lost in discussions about making content "easier for agents."

digianarchist 13 days ago

In his defense he clearly articulated that meaningful change has not yet been achieved and could be a decade away. Even pointing to specific examples of LLMs failing to count letters and do basic arithmetic.

What I find absent is where do we go from LLMs? More hardware, more training. "This isn't the scientific breakthrough you're looking for".

kdrvr 12 days ago

I honestly like his perspective around vibe coding. I feel like his original tweet has been taken misunderstood by the mainstream. (Proof-of-concepts churned out over the weekend will usually die or be mostly rewritten, anyways.) For programmers dipping their feet into new areas, I believe it can be useful.

Though, I do not see it being useful as a "gateway drug" (as he says) for kids learning to code. I have seen that children can understand langs and base programming concepts, given the right resources and encouragement. If kids in the 80s/early 90s learned BASIC and grew up to become software engineers; then what we have now (Scratch, Python, even Javascript + something like P5) are perfectly adequate to that task. Vibe coding really just teaches kids how to prompt LLMs properly.

alightsoul 13 days ago

It's interesting to see people here and on Blind are more wary? of AI than people in say, Reddit or Youtube comments

sponnath 13 days ago

Reddit and YouTube are such huge social media platforms that it really depends on which bubble (read: subreddits/yt channels) you're looking at. There's the "AGI is here" people over at r/singularity and then the "AI is useless" people at r/programming. I'm simplifying arguments from both sides here but you get my point.

NaN years ago

undefined

NaN years ago

undefined

goosebump 13 days ago

https://software3.com/index.htm

Amazing!!!

jes5199 13 days ago

okay I’m practicing my new spiel:

this focus on coding is the wrong level of abstraction

coding is no longer the problem. the problem is getting the right context to the coding agent. this is much, much harder

“vibe coding” is the new “horseless carriage”

the job of the human engineer is “context wrangling”

diggan 13 days ago

> coding is no longer the problem.

"Coding" - The art of literally using your fingers to type weird characters into a computer, was never a problem developers had.

The problem has always been understanding and communication, and neither of those have been solved at this moment. If anything, they have gotten even more important, as usually humans can infer things or pick up stuff by experience, but LLMs cannot, and you have to be very precise and exact about what you're telling them.

And so the problem remains the same. "How do I communicate what I want to this person, while keeping the context as small as possible as to not overflow, yet extensive enough to cover everything?" except you're sending it to endpoint A instead of endpoint B.

NaN years ago

undefined

throw234234234 13 days ago

I will counter this with the fact that sometimes, and depending on the abstraction level that you are trying to solve/work at code or some other determinstic language is the right and easier way language to describe the context. This doesn't just apply to SWE, but all forms of engineering (electrical, civil, mechanical, etc).

We have math notation for maths, diagrams for circuits, plans for houses, etc etc. Would hate to have to give long paragraphs of "English" to my house builder and watch what the result could be. Feels like being a lawyer at this point. English can be appropriate and now we also have that in our toolbox.

Describing context at the abstraction level and accuracy you care about has always been the issue. The context of what matters though as you grow and the same system has to deal with more requirements at once together IMV is always the challenge in ANY engineering discipline.

poorcedural 13 days ago

[dead]

AIorNot 14 days ago

Love his analogies and clear eyed picture

pyman 14 days ago

"We're not building Iron Man robots. We're building Iron Man suits"

NaN years ago

undefined

NaN years ago

undefined

benob 14 days ago

You can generate 1.0 programs with 3.0 programs. But can you generate 2.0 programs the same way?

olmo23 14 days ago

2.0 programs (model weights) are created by running 1.0 programs (training runs).

I don't think it's currently possible to ask a model to generate the weights for a model.

NaN years ago

undefined

taegee 13 days ago

I can't stop thinking about these agents as Agent Smith, The Architect, etc.

politelemon 14 days ago

The beginning was painful to watch as is the cheering in this comment section.

The 1.0, 2.0, and 3.0 simply aren't making sense. They imply a kind of a succession and replacement and demonstrate a lack of how programming works. It sounds as marketing oriented as "Web 3.0" that has been born inside an echo chamber. And yet halfway through, the need for determinism/validation is now being reinvented.

The analogies make use of cherry picked properties, which could apply to anything.

mentalgear 14 days ago

The whole AI scene is starting to feel a lot like the cryptocurrency bubble before it burst. Don’t get me wrong, there’s real value in the field, but the hype, the influencers, and the flashy “salon tricks” are starting to drown out meaningful ML research (like Apple's critical research that actually improves AI robustness). It’s frustrating to see solid work being sidelined or even mocked in favor of vibe-coding.

Vibe vs reality, and anyone actually working in the space daily can attest how brittle these systems are.

NaN years ago

undefined

monsieurbanana 14 days ago

> "Because they all have slight pros and cons, and you may want to program some functionality in 1.0 or 2.0, or 3.0, or you're going to train in LLM, or you're going to just run from LLM"

He doesn't say they will fully replace each other (or had fully replaced each other, since his definition of 2.0 is quite old by now)

NaN years ago

undefined

diggan 13 days ago

> The beginning was painful to watch as is the cheering in this comment section.

Yours is the second comment claiming there is "cheering" and "fanboying" in this comment section. What comments are you talking about? I've read through this submission multiple times since yesterday, yet I've seen none of that. What specific comments are the "cheering" ones?

amelius 14 days ago

The version numbers mean abrupt changes.

Analogy: how we "moved" from using Google to ChatGPT is an abrupt change, and we still use Google.

mazhar_TUF 14 days ago

[dead]

ukprogrammer 14 days ago

Why do non-users of LLM's like to despise/belittle them so much?

Just don't use them, and, outcompete those who do. Or, use them and outcompete those who don't.

Belittling/lamenting on any thread about them is not helpful and akin to spam.

djeastm 13 days ago

Some people are annoyed at the hype, some are making good faith arguments about the pros/cons, and some people are just cranky. AI is a popular subject and we've all got our hot takes.

dmitrijbelikov 14 days ago

I think that Andrej presents “Software 3.0” as a revolution, but in essence it is a natural evolution of abstractions.

Abstractions don't eliminate the need to understand the underlying layers - they just hide them until something goes wrong.

Software 3.0 is a step forward in convenience. But it is not a replacement for developers with a foundation, but a tool for acceleration, amplification and scaling.

If you know what is under the hood — you are irreplaceable. If you do not know — you become dependent on a tool that you do not always understand.

poorcedural 13 days ago

Foundational programmers form the base of where the seed can grow.

In a way programmers found where our roots grow, they can not find your limits.

Software 3.0 is a step into a different light, where software finds its own limits.

If we know where they are rooted, we will merge their best attempts. Only because we appreciate their resultant behavior.

NaN years ago

undefined

bedit 14 days ago

I love the "people spirits" analogy. For casual tasks like vibecoding or boiling an egg, LLM errors aren't a big deal. But for critical work, we need rigorous checks—just like we do with human reasoning. That's the core of empirical science: we expect fallibility, so we verify. A great example is how early migration theories based on pottery were revised with better data like ancient DNA (see David Reich). Letting LLMs judge each other without solid external checks misses the point—leaderboard-style human rankings are often just as flawed.

kypro 13 days ago

I know we've had thought leaders in tech before, but am I the only one who is getting a bit fed up by practically anything a handful of people in the AI space say being circulated everywhere in tech spaces at the moment?

dang 13 days ago

If there are lesser-known voices who are as interesting as karpathy or simonw (to mention one other example), I'd love to know who they are so we can get them into circulation on HN.

danny_codes 13 days ago

No it’s incredibly annoying I agree.

The hype hysteria is ridiculous.

kaycey2022 14 days ago

I hope this excellent talk brings some much needed sense into the discourse around vibe coding.

diggan 14 days ago

If anything I wished the conversation turned away from "vibe-coding" which was essentially coined as a "lol look at this go" thing, but media and corporations somehow picked up as "This is the new workflow all developers are adopting".

LLMs as another tool in your toolbox? Sure, use it where it makes sense, don't try to make them do 100% of everything.

LLMs as a "English to E2E product I'm charging for"? Lets maybe make sure the thing works well as a tool before letting it be responsible for stuff.

tudorizer 14 days ago

95% terrible expression of the landscape, 5% neatly dumbed down analogies.

English is a terrible language for deterministic outcomes in complex/complicated systems. Vibe coders won't understand this until they are 2 years into building the thing.

LLMs have their merits and he sometimes aludes to them, although it almost feels accidental.

Also, you don't spend years studying computer science to learn the language/syntax, but rather the concepts and systems, which don't magically disappear with vibe coding.

This whole direction is a cheeky Trojan horse. A dramatic problem, hidden in a flashy solution, to which a fix will be upsold 3 years from now.

I'm excited to come back to this comment in 3 years.

diggan 14 days ago

> English is a terrible language for deterministic outcomes in complex/complicated systems

I think that you seem to be under the impression that Karpathy somehow alluded to or hinted at that in his talk, which indicates you haven't actually watched the talk, which makes your first point kind of weird.

I feel like one of the stronger points he made, was that you cannot treat the LLMs as something they're explicitly not, so why would anyone expect deterministic outcomes from them?

He's making the case for coding with LLMs, not letting the LLMs go by themselves writing code ("vibe coding"), and understanding how they work before attempting to do so.

NaN years ago

undefined

rudedogg 13 days ago

> English is a terrible language for deterministic outcomes in complex/complicated systems.

Someone here shared this ancient article by Dijkstra about this exact thing a few weeks ago: https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...

NaN years ago

undefined

oc1 13 days ago

AI is all about context window. If you figured out the context problem, you will see that all these "AI is bullshit, it doesn't work and can't produce working code" goes away. Same for everything else.

NaN years ago

undefined

NaN years ago

undefined

strangescript 13 days ago

Who said I wanted my outcomes to be deterministic. Why is it that the only way we accept programming is for completely deterministic outcomes, when the reality is that is an implementation detail.

I am a real user and I am on a general purpose e-commerce site and my ask is "I want a TV that is not that expensive", then by definition the user request is barely deterministic. User requests are normally like this for any application. High level and vague at best. Then developers spend all their time on edge cases, user QA, in the weeds junk that the User does not care about at all. People dont want to click filters and fill out forms for your app. They want it to be easy.

NaN years ago

undefined

qjack 13 days ago

While I agree with you broadly, remember that those that employ you don't have those skills either. They accept that they are ceding control of the details and trust us to make those decisions or ask clarifying questions (LLMs are getting better at those things too). Vibe coders are clients seeking an alternative, not developers.

NaN years ago

undefined

NaN years ago

undefined

brainless 13 days ago

I am not sure I got your point about English. I thought Karpathy was talking about English being the language of prompts, not output. Outputs can be English but if the goal is to compute using the output, then we need structured output (JSON, snippets of code, etc.), not English.

NaN years ago

undefined

poorcedural 13 days ago

Time is a funny calculator, measuring how an individual is behind. And in the funny circumstance that an individual is human, they look back on this comment in 3 years and wonder why humans only see themselves.

m3kw9 13 days ago

Like biz logic requirements they need to be fine grained defined

serjester 13 days ago

I think you’re straw manning his argument.

He explicitly says that both LLMs and traditional software have very important roles to play.

LLMs though are incredibly useful when encoding the behavior of the system deterministically is impossible. Previously this fell under the umbrella of problems solved with ML. This would take a giant time investment and a highly competent team to pull off.

Now anyone can solve many of these same problems with a single API call. It’s easy to wave this off, but this a total paradigm shift.

belter 13 days ago

You just described Software 4.0...

NaN years ago

undefined

bgwalter 14 days ago

I'd like to hear from Linux kernel developers. There is no significant software that has been written (plagiarized) by "AI". Why not ask the actual experts who deliver instead of talk?

This whole thing is a religion.

mellosouls 14 days ago

There is no significant software that has been written (plagiarized) by "AI".

How do you know?

As you haven't evidenced your claim, you could start by providing explicit examples of what is significant.

Even if you are correct, the amount of llm-assisted code is increasing all the time, and we are still only a couple of years in - give it time.

Why not ask the actual experts

Many would regard Karpathy in the expert category I think?

NaN years ago

undefined

NaN years ago

undefined

bytefish 13 days ago

Microsoft is dogfooding Copilot in their dotnet/runtime [1] and dotnet/aspnetcore [2] repositories. This is the only time I have seen a company using its own AI Tools transparently. Yes, they label it an experiment, but I am pretty sure it’s mandated use within Microsoft.

I am an “AI skeptic”, so clearly I am biased here. What I am seeing in the repositories is, that Copilot hasn’t made any substantial contributions so far. The PRs, that went through? They often contain very, very detailed feedback, up to the point line by line replacements have been suggested.

The same engineers, that went up stage at “Microsoft Build 2025” to tell how amazing Copilot is and how it made them a 100x developer? They are not using Copilot in any of their PRs.

You said it’s a religion. I’d say it’s a cult. Whatever it is, outside the distortion bubble, this whole thing looks pretty bad to me.

[1] https://github.com/dotnet/runtime/pulls

[2] https://github.com/dotnet/aspnetcore/pulls

diggan 14 days ago

What counts as "significant software"? Only kernels I guess?

NaN years ago

undefined

NaN years ago

undefined

fHr 13 days ago

big companies still already lay off

huksley 14 days ago

Vibe coding is making a LEGO furniture, getting it run on the cloud is assembling the IKEA table for a busy restaurant

alightsoul 14 days ago

why does vibe coding still involve any code at all? why can't an AI directly control the registers of a computer processor and graphics card, controlling a computer directly? why can't it draw on the screen directly, connected directly to the rows and columns of an LCD screen? what if an AI agent was implemented in hardware, with a processor for AI, a normal computer processor for logic, and a processor that correlates UI elements to touches on the screen? and a network card, some RAM for temporary stuff like UI elements and some persistent storage for vectors that represent UI elements and past converstations

alightsoul 12 hours ago

This is basically what Lovable is. It's a multi billion dollar company today

flumpcakes 14 days ago

I'm not sure this makes sense as a question. Registers are 'controlled' by running code for a given state. An AI can write code that changes registers, as all code does in operation. An AI can't directly 'control registers' in any other way, just as you or I can't.

NaN years ago

undefined

NaN years ago

undefined

birn559 14 days ago

Because any precise description of what the computer is supposed to do is already code as we know it. AI can fill in the gaps between natural language and programming by guessing and because you don't always care about the "how" only about the "what". The more you care about the "how" you have to become more precise in your language to reduce the guess work of the AI to the point that your input to the AI is already code.

The question is: how much do we really care about the "how", even when we think we care about it? Modern programming language don't do guessing work, but they already abstract away quite a lot of the "how".

I believe that's the original argument in favor of coding in assembler and that it will stay relevant.

Following this argument, what AI is really missing is determinism to a far extend. I can't just save my input I have given to an AI and can be sure that it will produce the exact same output in a year from now on.

NaN years ago

undefined

therein 14 days ago

All you need is a framebuffer and AI.

abhaynayar 14 days ago

Nice try, AI.

ast0708 14 days ago

Should we not treat LLMs more as a UX feature to interact with a domain specific model (highly contextual), rather than expecting LLMs to provide the intelligence needed for software to act as partner to Humans.

guappa 14 days ago

He's selling something.

NaN years ago

undefined

NaN years ago

undefined

Aeroi 14 days ago

the fanboying for this dudes opinion is insane.

dang 13 days ago

Maybe so, but please don't post unsubstantive comments to Hacker News.

(Thoughtful criticism that we can learn from is welcome, of course. This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.)

NaN years ago

undefined

cedws 13 days ago

I'm about half way through the video and I'm really not seeing what all the praise is about, it just seems to be an AI optimism word salad.

edit: a lot of the comments giving praise on YouTube look like bots...

NaN years ago

undefined

mupuff1234 13 days ago

Yeah, not sure I ever saw anything similar on HN before, feels very odd.

I mean the talk is fine and all but that's about it?

NaN years ago

undefined

mrmansano 14 days ago

It's pastor preaching for the already converted, not new in the area. The only thing new is that they are selling the kool-aid this time.

NaN years ago

undefined

greybox 14 days ago

He's talking about "LLM Utility companies going down and the world becoming dumber" as a sign of humanity's progress.

This if anything should be a huge red flag

bryanh 14 days ago

Replace with "Water Utility going down and the world becoming less sanitary", etc. Still a red flag?

NaN years ago

undefined

iLoveOncall 14 days ago

He lives in a GenAI bubble where everyone is self-congratulating about the usage of LLMs.

The reality is that there's not a single critical component anywhere that is built on LLMs. There's absolutely no reliance on models, and ChatGPT being down has absolutely no impact on anything beside teenagers not being able to cheat on their homeworks and LLM wrappers not being able to wrap.

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

imiric 14 days ago

It's fascinating to see his gears grinding at 22:55 when acknowledging that a human still has to review the thousand lines of LLM-generated code for bugs and security issues if they're "actually trying to get work done". Yet these are the tools that are supposed to make us hyperproductive? This is "Software 3.0"? Give me a break.

rwmj 14 days ago

Plus coding is the fun bit, reviewing code is the hard and not fun bit, arguing with an overconfident machine sound like it'll be worse even than that. Thankfully I'm going to retire soon.

NaN years ago

undefined

NaN years ago

undefined

poorcedural 14 days ago

Because we are still using code as a proof that needs to be proven. Software 3.0 will not be about reviewing legible code, with its edge-cases and exploits and trying to impersonate hardware.

William_BB 14 days ago

[flagged]

AdieuToLogic 14 days ago

It's an interesting presentation, no doubt. The analogies eventually fail as analogies usually do.

A recurring theme presented, however, is that LLM's are somehow not controlled by the corporations which expose them as a service. The presenter made certain to identify three interested actors (governments, corporations, "regular people") and how LLM offerings are not controlled by governments. This is a bit disingenuous.

Also, the OS analogy doesn't make sense to me. Perhaps this is because I do not subscribe to LLM's having reasoning capabilities nor able to reliably provide services an OS-like system can be shown to provide.

A minor critique regarding the analogy equating LLM's to mainframes:

  Mainframes in the 1960's never "ran in the cloud" as it did
  not exist.  They still do not "run in the cloud" unless one
  includes simulators.

  Terminals in the 1960's - 1980's did not use networks.  They
  used dedicated serial cables or dial-up modems to connect
  either directly or through stat-mux concentrators.

  "Compute" was not "batched over users."  Mainframes either
  had jobs submitted and ran via operators (indirect execution)
  or supported multi-user time slicing (such as found in Unix).

distalx 14 days ago

Hang in there! Your comment makes some really good points about the limits of analogies and the real control corporations have over LLMs.

Plus, your historical corrections were spot on. Sometimes, good criticisms just get lost in the noise online. Don't let it get to you!

furyofantares 14 days ago

> The presenter made certain to identify three interested actors (governments, corporations, "regular people") and how LLM offerings are not controlled by governments. This is a bit disingenuous.

I don't think that's what he said, he was identifying the first customers and uses.

NaN years ago

undefined

jppope 14 days ago

Well that showed up significantly faster than they said it would.

dang 14 days ago

The team adapted quickly, which is a good sign. I believe getting the videos out sooner (as in why-not-immediately) is going to be a priority in the future.

seneca 14 days ago

Classic under promise and over deliver.

I'm glad they got it out quickly.

NaN years ago

undefined

aaron695 14 days ago

[dead]

sneak 14 days ago

Can we please stop standardizing on putting things in the root?

/.well-known/ exists for this purpose.

example.com/.well-known/llms.txt

https://en.m.wikipedia.org/wiki/Well-known_URI

jph00 14 days ago

You can't just put things there any time you want - the RFC requires that they go through a registration process.

Having said that, this won't work for llms.txt, since in the next version of the proposal they'll be allowed at any level of the path, not only the root.

NaN years ago

undefined

NaN years ago

undefined

NaN years ago

undefined

andrethegiant 14 days ago

https://github.com/AnswerDotAI/llms-txt/issues/2

ws169144 14 days ago

[dead]

lngnmn2 14 days ago

[dead]

varelse 14 days ago

[dead]

boxboxbox4 14 days ago

[dead]

sahil_sharma0 14 days ago

[dead]

kat529770 12 days ago

[dead]

kat529770 13 days ago

[dead]

researchai 14 days ago

I can't believe I googled most of the dishes on the menu every time I went to the Thai restaurant. I've just realised how painful that was when I saw MenuGen!

nottorp 14 days ago

In the era of AI and illiteracy...

black_13 14 days ago

[dead]

moralestapia 14 days ago

[flagged]

dang 14 days ago

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

"Don't be snarky."

https://news.ycombinator.com/newsguidelines.html

NaN years ago

undefined

paganel 14 days ago

[flagged]