• A slight correction

    I’ve been thinking about it, and the conclusion of my last post was wrong: I wrote:

    My suspicion is that this will eventually dawn on management and we will see corporations implementing policies to restrict or at least monitor how GenAI is used in the workplace. Eventually, GenAI use is going to be a scarlet letter, an indication that a product, service, or professional is low-effort and poor quality.

    That’s not it. Corporations don’t care at all if their production is mediocre, so long as it is profitable. If using GenAI makes the product 50% shittier but 5% more profitable, they will actually mandate that their employees use GenAI.

    And this is the problem: GenAI use will go from a neat hack that a corner-cutting worker bee figured out to shorten their work day to a top-down requirement for employees to be able to handle a ballooning workload.

    So, if you are a professional and you like your job and the satisfaction of producing 5 good things per day, tough shit, you will now have to produce 20 mediocre things a day using this GenAI-forward SaaS solution for which your employer just signed a 5-year support contract.

    Everything will speed up. You will have to use GenAI to draft e-mails because you won’t have time to do it yourself. You will have to use it to write memos, copy, presentations, and analyses, summarize meetings, summarize research, write performance reviews, draft annual reports, because the three other people that were supposed to be sharing the workload retired or got laid off or found a new job and the positions were never filled.

    The job that used to be satisfying to you will become stressful and dull because instead of doing good work, you have to use GenAI to churn out shit.

    That’s what’s coming: The enshittification of jobs.

    ————————–

  • What kind of a person would actually use GenAI??

    GenAI has been available to the public for, what, a year and a half? And now that we’ve settled into it, I would say there are two kinds of people who lean in to using GenAI:

    ONE: People who are not familiar with the subject matter in question.

    If you do not know JavaScript, ChatGPT’s JavaScript looks like wizardry. If you do know JavaScript, it can look like spaghetti. Likewise, if you are not a writer (or, let’s be frank, a reader), Claude seems brilliant. If you are, you’ve seen one Claude paragraph, you’ve seen ’em all.

    I’ve seen multiple demos of GenAI translation features where a person who does not know one of the languages in the language pairs gets some output and says “wow, that’s amazing!” with absolutely no way of knowing whether the translation was accurate or even vaguely correct.

    Subject matter noobs don’t know what they don’t know, and GenAI seems confident, so it easily draws in the naïve.

    TWO: People who are familiar with the subject matter in question, but don’t care if the output is mediocre or incomplete

    Let’s be honest, many people cut corners, and GenAI is the perfect corner-cutting tool. They don’t care if their e-mails sound like they came out of a can, they don’t care if the memo summary is incomplete, they don’t care if the people in the image have extra fingers, etc.

    The genuinely sloppy users of GenAI are what they are. Some people aren’t paid for quality work, and so they don’t produce quality work. We’ve all seen the campaign ads where the faces of the people in the background are melting. Or the outright scams like fake e-books and SEO spam. Fine. These people will always exist, this stuff will always be around.

    More concerning is professionals who get GenAI output that sounds plausible and say “good enough” and call it a day, without even checking to see if this frequently-wrong technology in fact performed the task correctly.

    I’m thinking about doctors who use GenAI to transcribe and then summarize conversations with patients; standards compliance reviewers who use it to check large PDFs against requirements; lawyers who use it to write contracts; technical writers who use it to write white papers. Lots of documents aren’t really produced in order to be read: They are produced in order to check a box, even though if they are actually needed someday, it is crucial that they be correct and complete.

    But GenAI is a technology that makes it very easy for people working in areas with poor visibility to produce deliverables that look plausible and complete, and without a detailed review, no one will know any different.

    My suspicion is that this will eventually dawn on management and we will see corporations implementing policies to restrict or at least monitor how GenAI is used in the workplace. Eventually, GenAI use is going to be a scarlet letter, an indication that a product, service, or professional is low-effort and poor quality.

    Unfortunately, it could be some time before this happens, and in the meantime, mediocrity accrues.

    ————————–

  • Actually, building a bot is fun

    One thing you notice right away about LLMs is they bear a striking resemblance to that ubiquitous internet character, the reply-guy: they always have an answer, they are brimming with confidence, and they are often completely wrong.

    It seemed only natural, then, to turn one into a full-time online version of itself, RGaaS (Reply-Guy as a Service), always on, always ready to reply to anything when summoned, with an “Actually,” followed by his (it’s always a “he”) opinion on what you should be thinking.

    And so I built the Actually Bot.

    Basically, the way it works is that any user can reply to any Mastodon post (or I guess any post on the ActivityPub protocol) and tag the Actually Bot and it will reply with its most mediocre take on why everyone else is wrong, or at least not quite right.

    The reply-guys can all move on to something else now, I have automated them out of a job.

    This was a really fun project and has a lot of separate parts. First, I had to modify an LLM to give it the right “personality” for this job. To do this, I used Ollama, a tool for running open source LLMs locally, on your own machine. I used the “customized prompt feature”: You basically write a little file that Ollama uses to tweak the LLM of your choosing to give it certain characteristics. I went with the new Llama3:8b model from Meta as my base model, then put the following instructions in my modelfile:

    FROM llama3
    
    PARAMETER temperature 3
    
    SYSTEM """
    You are receiving social media posts as prompts, and your job is to reply to these prompts. Please start all your replies with the word "actually". Reply as if you know more than everyone. You must recontextualize every prompt and correct or amend it, informing the prompter that they are wrong, even if ever so slightly. You write in short, sentences. Your replies must be short. Keep them to about three sentences. This is essential: keep your replies short. They must be under 500 characters.
    """

    Then I ran the following command in the console:

    ollama create actually_llama -f ./actually_llama

    … and my model was ready to roll. Next, I needed a program to connect to the Ollama API to send the LLM prompts and get responses. Python was great for that, as both Ollama and Mastodon have solid Python libraries. Probably the slowest part was picking through Mastodon.py to figure out how the methods work and what exactly they return. It’s a very robust library with a million options, and fortunately it’s also extremely well documented, so while it was slow going, I was able to whack it together without too much trouble.

    I’m not going to get into all the code here, but basically, I wrote a simple method that checks mentions, grabs the text of a post and the post it is replying to, and returns them for feeding into the LLM as the prompt.

    Despite my very careful, detailed, and repetitive instructions to be sure replies are no more than 500 characters, LLMs can’t count, and they are very verbose, so I had to add a cleanup method that cuts the reply down to under 500 characters. Then I wrote another method for sending that cleaned-up prompt to Ollama and returning the response.

    The main body starts off by getting input for the username and password for login, then it launches a while True loop that calls my two functions, checking every 60 seconds to see if there are any mentions and replying to them if there are.

    OK it works! Now came the hard part, which was figuring out how to get to 100% uptime. If I want the Actually Bot to reply every time someone mentions it, I need it to be on a machine that is always on, and I was not going to leave my PC on for this (nor did I want it clobbering my GPU when I was in the middle of a game).

    So my solution was this little guy:

    … a Lenovo ThinkPad with a 3.3GHz quad-core i7 and 8gb of RAM. We got this refurbished machine when the pandemic was just getting going and it was my son’s constant companion for 18 months. It’s nice to be able to put it to work again. I put Ubuntu Linux on it and connected it to the home LAN.

    I actually wasn’t even sure it would be able to run Llama3:8b. My workstation has an Nvidia GPU with 12gb of VRAM and it works fine for running modest LLMs locally, but this little laptop is older and not built for gaming and I wasn’t sure how it would handle such a heavy workload.

    Fortunately, it worked with no problems. For running a chatbot, waiting 2 minutes for a reply is unacceptable, but for a bot that posts to social media, it’s well within range of what I was shooting for, and it didn’t seem to have any performance issues as far as the quality of the responses either.

    The last thing I had to figure out was how to actually run everything from the Lenovo. I suppose I could have copied the Python files and tried to recreate the virtual environment locally, but I hate messing with virtual environments and dependencies, so I turned to the thing everyone says you should use in this situation: Docker.

    This was actually great because I’d been wanting to learn how to use Docker for awhile but never had the need. I’d installed it earlier and used it to run the WebUI front end for Ollama, so I had a little bit of an idea how it worked, but the Actually Bot really made me get into its working parts.

    So, I wrote a Docker file for my Python app, grabbed all the dependencies and plopped them into a requirements.txt file, and built the Docker image. Then I scp’d the image over to the Lenovo, spun up the container, and boom! The Actually Bot was running!

    Well, OK, it wasn’t that simple. I basically had to learn all this stuff from scratch, including the console commands. And once I had the Docker container running, my app couldn’t connect to Ollama because it turns out, because Ollama is a server, I had to launch the container with a flag indicating that it shared the host’s network settings.

    Then once I had the Actually Bot running, it kept crashing when people tagged it in a post that wasn’t a reply to another post. So, went back to the code, squashed bug, redeploy container, bug still there because I didn’t redeploy the container correctly. There was some rm, some system prune, some struggling with the difference between “import” and “load” and eventually I got everything working.

    Currently, the Actually Bot is sitting on two days of uninterrupted uptime with ~70 successful “Actually,” replies, and its little laptop home isn’t even on fire or anything!

    Moving forward, I’m going to tweak a few things so I can get better logging and stats on what it’s actually doing so I don’t have to check its posting history on Mastodon. I just realized you can get all the output that a Python script running in a Docker container prints with the command docker logs [CONTAINER], so that’s cool.

    The other thing I’d like to do is build more bots. I’m thinking about spinning up my own Mastodon instance on a cheap hosting space and loading it with all kinds of bots talking to each other. See what transpires. If Dead Internet Theory is real, we might as well have fun with it!

    ————————–

  • Some lessons before moving on

    I finally “finished” my first major Python application the other day. It’s an RSS reader you can use to parse large lists of RSS feeds, filter for keywords, and save the results to an HTML file for viewing in a browser.

    After spending easily a month building a console-based text command user interface for this thing, it just dawned on me, “I bet there’s already a module for this.” Yep, Cmd. So two things about that.

    First, it’s a great example of a major failure for Google and a major success for ChatGPT. Initially, I thought what I needed was a “command line interface,” so I did some extensive googling for that and all I came up with was stuff like argparse, which allows you to run Python programs from the command line with custom arguments (home:~$ python3 myscript.py [my_command]). Not finding anything useful, I went on to build what I actually needed, which is a “line-oriented command interpreter.”

    The problem with Google is it’s hard to find things when you can describe them, but don’t know what they’re called, especially in technical fields with very specific jargon. However, once I had my epiphany that this thing probably already exists somewhere, I was able to describe what I wanted to do to ChatGPT and it immediately informed me of the very nifty Cmd class.

    Second, I’m not mad that I spent a month building a thing that already exists. Really. I learned so much going through the absurd process of building my own line-oriented command interpreter:

    • Loops, baby, wow do I know some loops. Nested loops, infinite loops, breaking a loop, etc. For, while, you name it. Loops are my jam;
    • If-else logic, my brain is mashed into a horrible, spidery decision tree. And by extension, boolean logic. Fucking hell, that was trial by fire. For example, I did not know None evaluates to False because in Python, it is “falsy.” I spent way too many hours trying to debug this, but now I will never forget it;
    • I learned some Pytest and wrote a ton of unit tests for my command interpreter. I’m not going to say I’m good at writing unit tests yet, because I’m not, but I understand the concepts and the importance of unit testing, and I have a few tools to work with.
    • I know what I want from a command interpreter. OK, I built a dumb one, but when I refactor my code to replace it with Cmd, I already know what features to look for and how I want to deploy it.

    I’m going to put DonkeyFeed down for a bit now. I have to finish up some actual classes (SQL lol) for a degree I’m working on as the term comes to an end, and I have a couple other Python projects I want to pursue.

    When I pick it up again, yeah, I’ll probably refactor the user interface to make it more robust and stable, with better error handling and help functions. And then I need to find a better way to package it (yikes) for distribution.

    I guess the main thing I’m learning as I shelve this one for a bit is you never really finish a piece of software, you just kind of get it to a point where it’s not completely broken. Maybe you’ll have time to fix it more later!

    ————————–

  • Technology of the future

    I’m coming around to the idea that generative artificial intelligence is the new blockchain. Leave aside the whole investment angle for a second, let’s just talk about the technology itself.

    When Bitcoin (and cryptocurrency in general) really started to hit, the boosters promised a thousand applications for the blockchain. This wasn’t just a weird curiosity for the kind of people who encrypt their e-mail, or even just a new asset class: It was potentially everything. Distributed ledgers would replace financial transaction databases. They could be used to track property ownership, contracts, produce, livestock. It would replace WiFi, and the internet itself! The future was soon.

    But the future never arrived, for the simple fact–as flagged early on by many people who understood both blockchain technology and the fields it would supposedly disrupt–that blockchain wasn’t as good as existing technologies like relational databases.

    For one thing, it is expensive to run at scale, and updating a distributed ledger is incredibly slow and inefficient, compared to updating the kind of relational databases banks and credit card companies have used for decades. But also, the immutability of a distributed ledger is a huge problem in sectors like finance and law, where you need to be able to fix errors or reverse fraud.

    These problems aren’t really things you can adjust or “fix.” They are fundamental to the blockchain technology. And yet, during the crypto boom, a thousand startups bloomed promising magic and hand-waving these fundamental problems as something that would be solved eventually. They weren’t.

    Turning to generative artificial intelligence, I see the same pattern. You have a new and exciting technology that produces some startling results. Now everyone is launching a startup selling a pin or a toy or a laptop or a search engine or a service running on “AI.” The largest tech companies are all pivoting to generative artificial intelligence, and purveyors of picks-and-shovels like Nvidia are going to the Moon.

    But this is despite the fact that, like blockchain before it, generative artificial intelligence has several major problems that it may not actually not be possible to fix because they are fundamental to the technology.

    First, it doesn’t actually understand anything.1 It is not an intelligence, it is a trillion decision trees in a trench coat. It is outputing a pattern of data that is statistically related to the input pattern of data. This means it will often actually give you the opposite of what you request, because hey, the patterns are a close match!

    Second, because of this, the output of a generative artificial intelligence is unreliable. It is a Plinko board the size of the surface of the Moon. You put your prompt in the top and you genuinely don’t know what will come out the bottom. Even if your prompt is exactly the same every time, output will vary. This is a problem because the whole point of computers is that they are predictable. They do exactly the thing you tell them to do in the code language. That’s why we can use them in commercial airlines and central banks and MRI machines. But you can’t trust a generative artificial intelligence to do what you tell it to do because sometimes, and for mysterious reasons, it just doesn’t.

    And third, a super-problem that is sort of a combination of the above two problems is that generative artificial intelligences sometimes just… make stuff up. AI people call it “hallucinating” because it sounds cooler, like this otherwise rational computer brain took mushrooms and started seeing visions. But it’s actually just doing what it is designed to do: output a pattern based on an input pattern. A pattern recognition machine doesn’t care if something is “true,” it is concerned with producing a valid data pattern based on the prompt data pattern, and sometimes that means making up a whole origin story involving a pet chicken named “Henrietta”, because that’s a perfectly valid data pattern. No one has figured out how to solve this problem.

    Who knows, maybe Google, Microsoft, Meta, and OpenAI will fix all this! Google’s new 1 million token context for its latest Gemini 1.5 model sounds promising. I guess that would be cool. My sense, though, is that despite all the work and advancements, the problems I outline here persist. I see so many interesting ideas and projects around building AI agents, implementing RAG, automating things, etc. but by the time you get to the end of the YouTube tutorial, eh, turns out it doesn’t work that well. Promising, but never quite there yet. Like with blockchain, the future remains persistently in the future.

    1. The examples I’m using here are from Gary Marcus’s Substack, which is very good, check it out. ↩︎

    ————————–

  • Done, not done

    I just finished my first Python app, so of course now I have to re-write it. The app is called DonkeyFeed, it’s a simple little dingus that is supposed to filter RSS feeds for certain keywords: Users save a list of RSS feed links with their associated keywords, then run the searches anytime and save any results in an easy-to-read HTML file.

    My original idea was to use a GUI, but then I figured out that Tkinter is janky as hell and pretty bad for doing anything dynamic. So I switched to a command line user interface, and that went pretty OK! I’ve actually got a working product now that works.

    But now I’m realizing I designed the user interface as rigid if-then flowchart of options, when what I really should be doing is an input() loop where users can enter commands with arguments to run, add, remove, or modify filters right from the cursor, instead of poking around with menu options.

    Fortunately, I focused this project on using a class-based structure, with all the moving parts put into separate objects. There’s a class for handling the .JSON file with the RSS roster, a class for parsing the RSS feeds, a class for handling configurations, a class for the user menus, etc., all with their various associated methods. So changing the UI won’t mean untangling a bunch of stuff, more like reassembling existing parts into different loops and logic.

    In general, I’m starting to feel like that’s what most of coding is: take a thing and plug it into, get it to talk to, or wrap it in another thing. You build your code as a series of things that can be made to work with other things, then take it home, throw it in a pot, add some broth, a potato… baby you got a stew going! Eh, I digress.

    ————————–

  • Asymmetry

    I had a fun time yesterday playing with customized prompts in Ollama. These basically wrap the prompts you give to your local LLM with a standing prompt that conditions all responses. So, if you want your LLM to respond to queries briefly, you can specify that. Or of you want it to respond to every prompt like Foghorn Leghorn, hey. You can also adjust the “temperature” of the model: The spicier you make it, the weirder things get.

    First I tried to make a conspiracy theorist LLM, and it didn’t really work. I used llama2-uncensored and it was relentlessly boring. It kind of “both sides”ed everything and was studiously cautious about taking a position on anything. (Part of the problem may have been that I had the temperature set at 0.8 because I thought the scale was 0-1.)

    So I moved on to a different uncensored model, dolphin-mixtral, and oh boy. This time, I set the temperature to 2 and we were off to the races:

    OK buddy, go home, you’re drunk.

    Then I cranked the temperature up to 3 and made a mean one…

    … and a kind of horny/rapey one that I won’t post here. It was actually very easy, in a couple of minutes, to customize an open-source LLM to respond however you want. I could have made a racist one. I could have made one that endlessly argues about politics. I could have made one that raises concerns about Nazis infiltrating the Ukrainian armed forces, or Joe Biden’s age, or the national debt.

    Basically, there’s a lot of easy ways to customize an LLM to make everyone’s day worse, but I don’t think I could have customized one to make anything better. There’s an asymmetry built into the technology. A bot that says nice things isn’t going to have the same impact as a bot that says mean things, or wrong things.

    It got me thinking about Brandolini’s law, the asymmetry of bullshit, “The amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it,” which you can generalize even further to “it’s easier to break things than to make things.”

    As the technology stands right now, uncensored open-source LLMs can be very good at breaking things– trust, self-worth, fact-based reality, sense of safety. It would be trivial to inject LLM-augmented bots into social spaces and corrupt them with hate, conflict, racism, and disinformation. It’s a much bigger lift to use an LLM to make something.

    The cliche is that a technology is only as good as its user, but I’m having a hard time imaging a good LLM user who can do as much good as a bad LLM user can do bad.

    ————————–

  • Sidetracked

    Does it ever happen where you have some plans for a day, but your son is like “I wonder if you could code Blackjack in Python” and you can’t get it out of your head, so you open up your IDE and start tapping away, and you’re breaking down the problem and sorting out the structure and putting different parts of the game into different objects and then you’re basically done, but it doesn’t work for some reason, so then you troubleshoot that, and then it doesn’t work for a different reason, so you troubleshoot that, and then you call your son over and say “hey, check it out!” and then he checks it out and it doesn’t work again for a new and unforeseen reason, and you’re kinda mad at this thing but also you can’t put it down even for a second even though your dog needs to go out and someone should do the dishes and you had plans for today, but you’ve got this burr under your saddle that you can’t get rid of so you smash one bug after another until finally you hit “execute” and you play a game of Blackjack, and then another one, and then another one, and everything works as expected, and it kinda feels like you wasted the whole morning banging your head against this thing, but in the end, you coded Blackjack in Python and it works, so that’s not nothing.

    ————————–

  • Moving on from ChatGPT

    I had an unsettling experience a few days back where I was booping along, writing some code, asking ChatGPT 4.0 some questions, when I got the follow message: “You’ve reached the current usage cap for GPT-4, please try again after 4:15 pm.” I clicked on the “Learn More” link and basically got a message saying “we actually can’t afford to give you unlimited access to ChatGPT 4.0 at the price you are paying for your membership ($20/mo), would you like to pay more???”

    It dawned on me that OpenAI is trying to speedrun enshitification. The classic enshitification model is as follows: 1) hook users on your product to the point that it is a utility they cannot live without, 2) slowly choke off features and raise prices because they are captured, 3) profit. I say it’s a speedrun because OpenAI hasn’t quite accomplished (1) and (2). I am not hooked on its product, and it is not slowly choking off features and raising prices– rather, it appears set to do that right away.

    While I like having a coding assistant, I do not want to depend on an outside service charging a subscription to provide me with one, so I immediately cancelled my subscription. Bye, bitch.

    But then I got to thinking: people are running LLMs locally now. Why not try that? So I procured an Nvidia RTX 3060 with 12gb of VRAM (from what I understand, the entry-level hardware you need to run AI-type stuff) and plopped it into my Ubuntu machine running on a Ryzen 5 5600 and 48gb of RAM. I figured from poking around on Reddit that running an LLM locally was doable but eccentric and would take some fiddling.

    Reader, it did not.

    I installed Ollama and had codellama running locally within minutes.

    It was honestly a little shocking. It was very fast, and with Ollama, I was able to try out a number of different models. There are a few clear downsides. First, I don’t think these “quantized” (I think??) local models are as good as ChatGPT 3.5, which makes sense because they are quite a bit smaller and running on weaker hardware. There have been a couple of moments where the model just obviously misunderstands my query.

    But codellama gave me a pretty useful critique of this section of code:

    … which is really what I need from a coding assistant at this point. I later asked it to add some basic error handling for my “with” statement and it did a good job. I will also be doing more research on context managers to see how I can add one.

    Another downside is that the console is not a great UI, so I’m hoping I can find a solution for that. The open-source, locally-run LLM scene is heaving with activity right now, and I’ve seen a number of people indicate they are working on a GUI for Ollama, so I’m sure we’ll have one soon.

    Anyway, this experience has taught me that an important thing to watch now is that anyone can run an LLM locally on a newer Mac or by spending a few hundred bucks on a GPU. While OpenAI and Google brawl over the future of AI, in the present, you can use Llama 2.0 or Mistral now, tuned in any number of ways, to do basically anything you want. Coding assistant? Short story generator? Fake therapist? AI girlfriend? Malware? Revenge porn??? The activity around open-source LLMs is chaotic and fascinating and I think it will be the main AI story of 2024. As more and more normies get access to this technology with guardrails removed, things are going to get spicy.

    ————————–

  • Team communication issues and a pivot

    I’m working on coding a desktop RSS reader with a GUI and the process is going like this: (1) design a feature, (2) work on the feature on one end, (3) find out it doesn’t work or I need to learn some new library to make it work on the other end, (4) mostly finish the feature but now I have to completely change the other end, (5) also, I’ve thought about it more and changed the parameters for the features I want, so (6) start over at (1).

    It’s frustrating, but I realized I’m not just learning coding with this process, I’m learning design and project management, and even though I am the client, the designer, the developer, and the project manager, I’m still having communication issues haha lol lmao. OK, but I have been learning-by-doing and some of the early lessons are as follows:1

    • Define the features you want from the beginning. I imagine this is kind of a moving target in many cases depending on the client, but as much as possible, I should nail down the stuff that is nail-down-able.
    • Do the back end first. Break the work down into as many pieces (classes/methods) as possible that return discrete objects that can be manipulated and used in whatever way the front-end designer (future me) decides.
    • GUIs suck, use a browser if you need to display a lot of content. Tkinter has been really interesting and I’ve enjoyed the learning process, but using it to display articles for reading is a terrible idea. The GUI should be used for interacting with the functions in my program and getting user input, and that’s it.

    Attending to the first bullet point on my little list here, and in response to some comments from a friend, I’m going to pivot (!!) away from a desktop RSS feed reader and toward a desktop RSS feed parser that searches for keywords and saves the entries that hit into an HTML document that can be opened in a browser for review. This means I can focus on building the logic of the parser, plus a mechanism for managing a database of saved RSS feeds and search terms. That’s probably a lot for now, so I’ll leave the front-end GUI for later, when I have everything running correctly from the terminal. OK? OK.

    1. Do not @ me about agile. I am familiar with agile. I have thoughts about it, but I will not put them here, yet. ↩︎

    ————————–