Skip to content

1 · How AI makes images, music & text

You type a sentence and a few seconds later there's a painting, a song, or a finished paragraph. It feels like magic. It isn't — and understanding what's actually happening underneath is what separates someone who uses creative AI well from someone who just gets surprised by it.

The tools that make new pictures, sounds, or words are called generative AI ("generative" just means it generates — it makes new stuff, instead of only sorting or labeling things). Underneath, they all run on the same basic idea: learn patterns from a huge pile of examples, then produce something new that fits those patterns.

  • Text tools (chatbots) read enormous amounts of human writing and learn which words tend to follow which. When you prompt one, it predicts likely next words, over and over, until it has a reply. It's a very sophisticated next-word predictor.
  • Image tools learn from huge collections of pictures paired with descriptions. They learn what "a golden retriever" or "a watercolor sunset" tends to look like, then build a new image that matches your words — usually by starting from visual noise and refining it step by step.
  • Music/audio tools learn patterns in sound — melody, rhythm, instruments, style — from large collections of recordings, then generate new audio that fits the style you ask for.

Notice the thread running through all three: every one of them learned from work that humans made. The paintings, songs, photos, and writing in that training data came from real artists, musicians, photographers, and writers. Hold onto that fact — it's the root of almost every ethics question later in this course.

If you remember one thing per typeTextImagesMusic
What it learned frommountains of human writingimages + their descriptionsrecordings + their styles
What it's really doingpredicting likely next wordsbuilding an image to match wordsgenerating audio to match a style
What it is not doing"knowing" facts"drawing from imagination""feeling" the music

Plain-words summary: generative AI is a pattern machine. It's not imagining, feeling, or understanding — it's recombining patterns it learned from human-made work into something new that fits your request.

Think about it. All three tools "learned from examples." In one sentence, where did all those examples actually come from — and why might that matter to the people who made them?

Sources