Moving on from ChatGPT

I had an unsettling experience a few days back where I was booping along, writing some code, asking ChatGPT 4.0 some questions, when I got the follow message: “You’ve reached the current usage cap for GPT-4, please try again after 4:15 pm.” I clicked on the “Learn More” link and basically got a message saying “we actually can’t afford to give you unlimited access to ChatGPT 4.0 at the price you are paying for your membership ($20/mo), would you like to pay more???”

It dawned on me that OpenAI is trying to speedrun enshitification. The classic enshitification model is as follows: 1) hook users on your product to the point that it is a utility they cannot live without, 2) slowly choke off features and raise prices because they are captured, 3) profit. I say it’s a speedrun because OpenAI hasn’t quite accomplished (1) and (2). I am not hooked on its product, and it is not slowly choking off features and raising prices– rather, it appears set to do that right away.

While I like having a coding assistant, I do not want to depend on an outside service charging a subscription to provide me with one, so I immediately cancelled my subscription. Bye, bitch.

But then I got to thinking: people are running LLMs locally now. Why not try that? So I procured an Nvidia RTX 3060 with 12gb of VRAM (from what I understand, the entry-level hardware you need to run AI-type stuff) and plopped it into my Ubuntu machine running on a Ryzen 5 5600 and 48gb of RAM. I figured from poking around on Reddit that running an LLM locally was doable but eccentric and would take some fiddling.

Reader, it did not.

I installed Ollama and had codellama running locally within minutes.

It was honestly a little shocking. It was very fast, and with Ollama, I was able to try out a number of different models. There are a few clear downsides. First, I don’t think these “quantized” (I think??) local models are as good as ChatGPT 3.5, which makes sense because they are quite a bit smaller and running on weaker hardware. There have been a couple of moments where the model just obviously misunderstands my query.

But codellama gave me a pretty useful critique of this section of code:

… which is really what I need from a coding assistant at this point. I later asked it to add some basic error handling for my “with” statement and it did a good job. I will also be doing more research on context managers to see how I can add one.

Another downside is that the console is not a great UI, so I’m hoping I can find a solution for that. The open-source, locally-run LLM scene is heaving with activity right now, and I’ve seen a number of people indicate they are working on a GUI for Ollama, so I’m sure we’ll have one soon.

Anyway, this experience has taught me that an important thing to watch now is that anyone can run an LLM locally on a newer Mac or by spending a few hundred bucks on a GPU. While OpenAI and Google brawl over the future of AI, in the present, you can use Llama 2.0 or Mistral now, tuned in any number of ways, to do basically anything you want. Coding assistant? Short story generator? Fake therapist? AI girlfriend? Malware? Revenge porn??? The activity around open-source LLMs is chaotic and fascinating and I think it will be the main AI story of 2024. As more and more normies get access to this technology with guardrails removed, things are going to get spicy.

23 responses to “Moving on from ChatGPT”

  1. @pjk I am not surprised at all because it fits so much with Open AI’s business model. They have to go fast, ingest training data recklessly, and basically cause a lot of harms to get the growth in model size/profit that they need before the law catches up to them and cuts them off at the knees, which they know will happen soon. Of course they have to speed run enshittification. They don’t have time.

  2. @pjk Fascinating that LLMs can be run locally. I had no idea. Does this provide a more ecologically sustainable model for their huge energy consumption, or does it distribute their energy usage out to the public?

    1. @Brad_Rosenheim
      The language models that people run on their own hardware are typically much smaller than the OpenAI models. Thus they also consume a lot less energy. Of course the energy and hardware costs are paid by the user, making it more transparent.
      @pjk

      1. @osma @pjk With your new equipment do expect to see an increase in your household energy usage or will any change be "in the weeds?"

        1. @Brad_Rosenheim
          I've only run small, quantized (compressed) LLMs on my regular Linux PC laptop. It's a bit slow, unless you have a GPU or a high-end M1 or M2 Mac, which I don't. The energy cost is negligible. It's similar to video encoding, playing 3D games or other heavy processing – the fan is spinning more than usual. Brewing the tea consumed during this activity probably takes more energy…
          @pjk

  3. @pjk I got me an old Kepler card from a year ago, but drivers and python versions have marched on, leaving it incompatible without a ton of work on my end.

    Sigh.

  4. @pjk It's very interesting to consider locally run LLM's as an alternative to OpenAI.

    Pardon me if this is a naive question, but is there a subscription fee and use limit because of the large carbon footprint and water usage for ChatGPT? I get a large part of it is also corporate greed, but I had read that a standard ChatGPT paragraph takes up a horrendous amount of water to run. Is this issue somewhat mitigated with locally run LLM's?

    1. @AlexCorby @pjk

      is there a subscription fee and use limit because of the large carbon footprint and water usage for ChatGPT?

      Absolutely not, they don’t give a single shit about their carbon emissions nor environmental impact of their water usage. The pragmatic reason for the limit (besides corporate greed, as you mentioned) is because they’re literally running out of computing power/energy to power datacenters/water to cool them as they are right now.

      Is this issue somewhat mitigated with locally run LLM’s?

      Yes, they’re much smaller and therefore consume much less energy. However, running an RTX 3060 at high load for a significant amount of time will still be noticeable on your electricity bill.

  5. @pjk I wonder whether a less-powerful AI model can (or cannot) be run on modest hardware, like a cheap smartphone, or a 5-year old entry-level PC… 🤔👿

    1. @jcastroarnaud @pjk I was thinking about the same thing. I was wondering what the minimum hardware would be to still get useful results from these models.

  6. @pjk it is unfortunate that text models are just… too big to run anything close to GPT 3+ locally T_T

    1. @agatha @pjk Did you actually read til the end?

      Hot take: Either you have a problem that is simple enough that *any* transformer-based model with *some* sensible tweaking can solve, or it is so complex taht effectively none can.

      1. @ftranschel @pjk how does that invalidate my point? just because it’s not useful? i was explicitly talking about GPT-3 and later
        you don’t have to be mean either

        1. @agatha @pjk Sorry, I didn't mean to come across as rude.

          More on point: I don't think the performance is that much different. LLama is arguably closer to GPT-3.5 tzan to GPT-3. And Mistral is better in a lot of settings.

          1. @ftranschel @pjk i am interested in coding capabilities of it, chat gpt is, on my experience, god awful at it, maybe a more fine tuned model works well
            i wish AI had a better support for AMD and rocm, unfortunately nvidia has one monopoly of this (quite a lot of them work on non-nvidia gpus, and you can even run on cpus, but results in regards to speed and VRAM may vary)

          2. @ftranschel @pjk i say maybe even seeing that op mentioned it did work well for them, cause this varies a lot from language to language and application to application, that’s all

          3. @agatha @pjk I see. Did you try Dolphin Mistral or Wizard Coder?

  7. @pjk Thanks for sharing. I especially found the enshittification speedrun explanation helpful, and the llama insight. Saving this for later.

  8. @pjk I see code assistance as one of the few serious areas where LLMs could add value.

    What is the competitive space like specifically for code assistants?

    Are there ones not based in ChatGPT? Are there ones based in Europe? Based on MistralAI perhaps?

    If there is enough competition OpenAI might not be able to pull such a stunt off.

    #CodeAssistant #CodingAssistant #LLM #AI #ChatGPT #Mistral

  9. @pjk
    Maybe check out llamafile. One giant executable with the LLM and web service built in. Scroll down the GitHub readme for the flag to make use of your GPU too.

    1. that’s pretty cool! there’s gonna be so many more of these things rolled out over the next year.

  10. @pjk

    Thanks for this.
    I think I may try this on runpod first.
    I can get an RX 3090 machine for around 40 cents an hour or less. It adds up over time, but I don't have a big GPU to hand at the moment.

    If I opt for smaller boxes, I can get down to less than 20 cents

Leave a Reply

Your email address will not be published. Required fields are marked *