您需要 登录 才可以下载或查看,没有账号?注册
x
本帖最后由 chenpan85 于 2022-9-14 17:31 编辑
Ai-generated art is beginning to quietly reshape culture. Over the past few years, the ability of machine learning systems to generate graphics from text commands has greatly improved in quality, precision, and expressiveness. Now that these tools have moved out of the lab and into the hands of mass users, they are creating a language for the visual expression of their feelings, and they are likely to cause new problems.
There are only a few dozen top-of-the-line image-generating ais, and they are complex and expensive to make, requiring access to millions of images to train the system (which looks for picture patterns and replicates them) and a lot of computational effort (costs vary, but a multi-million dollar price tag is not out of the question).
At present, the output of these systems is often seen as novel when it appears on magazine covers or is used to generate parodies. But as we said, artists and designers are integrating this software into their workflows, and soon AI-generated and AI-enhanced art will be everywhere. Copyright issues (Who owns the rights to the pictures? Who makes it? And potential hazards (such as biased output or AI-generated error messages) must be dealt with quickly.
AI生成的艺术正在开始悄然重塑文化。过去几年里,机器学习系统从文本指令生成图形的能力在质量、精准度和表现力方面都有了很大的提升。如今,这些工具已经走出了实验室,进入了大众用户手中,他们正在创造心得视觉表达语言,也很可能带来新麻烦。 目前只有几十种顶级图像生成AI,它们的制作既复杂又昂贵,需要访问数以百万计的图片用来训练系统(它寻找图片模式并进行复制),还需要很大的计算工作量(成本各不相同,但百万美元的价格并不是不可能)。 目前,这些系统的输出产品出现在杂志封面或用于生成模仿的时候,往往被视为新奇的。但正如我们所说,艺术家和设计师们正在将该软件集成到他们的工作流程之中,短时间之内,AI生成和AI增强的艺术将无处不在。版权问题(谁拥有图片归属权?算谁制作的?)和潜在的危险(如有偏见的输出或者AI生成的错误信息)必须迅速处理。
As technology goes mainstream, though, one company will get some credit for its strengths: A 10-person lab called Midjourney has made an AI photo generator of the same name, available through the Discord chat server. Although the name may sound strange, you've probably seen Midjourney's system output appear on social media. To generate your own images, all you have to do is join Midjourney's Discord server, type in an indicator, and the system will make them for you.
"A lot of people ask us, why don't we just make an iOS camera app?" said David Holz, founder of Midjourney. But people want to do things together, and if you do it on iOS, you have to do your own social network, which is hard. So if you want your own social experience, Discord is great."
Sign up for a free account and you get 25 points to see all the images generated in public chat rooms. Then you have to pay, $10 or $30 a month, depending on how many images you want to produce, whether they're privately owned or not.
However, Midjourney has recently been expanding its model to allow anyone to create their own Discord server using their own AI image generator. "We're moving from a Midjourney universe to a Midjourney multiverse," says Holz, who thinks the result will be incredible: AI-enhanced ideas are popping up, but that's just the tip of the iceberg.
In an interview with TheVerge, Holz talked about his ambitions for Midjourney, including his reasons for building an "engine of imagination" and why he thinks AI is more like water than tigers.
Here's the full interview compiled by Gamelook:
Q: First, a little bit about yourself and Midjourney. What's your background and how did you get into this field? What is Midjourney? A company or a community? How would you describe it?
David Holz: My name is David Holz and I'm a serial entrepreneur. In short, here's my story: I ran a design business in high school, studied physics and math in college, and did my PhD in fluid mechanics while working at NASA and Max Planck. At one point I was overwhelmed and put everything on hold. So I moved to San Francisco and started a tech company called Leap Motion around 2011. We sell devices that do motion capture on the hand, creating a lot of gestural interface space.
I started Leap Motion and ran it for 12 years, [but] eventually, I wanted to find a different environment than a big venture-backed company, and I left and started Midjourney. Right now, it's small, we're only 10 people, we have no investors, and we're not motivated by financial return. There's no pressure to sell anything or be a public company, just to have a home for the next 10 years, to be able to do a lot of meaningful, cool projects, hopefully meaningful not just to me, but to the world, and have fun.
不过,随着技术成为主流,一家公司将会因为其优势赢得一些赞誉:一个名为Midjourney的10人实验室,做了一个同名AI图片生成器,可以通过Discord聊天服务器使用。尽管这个名字听起来可能有些陌生,你们或许已经看到过Midjourney的系统输出的作品出现在社交媒体上。为了生成你自己的图片,你只需要加入Midjourney的Discord服务器,输入一个指示符,系统就会为你做出图片。 Midjourney创始人David Holz接受采访时表示,“很多人问我们,为什么不直接做一个iOS拍照应用呢?但人们想要一起做事情,如果你在iOS平台做这件事,就必须做你自己的社交网络,这是很难的。所以,如果你想要自己的社交体验,Discord是很好的选择。” 注册一个免费账户,你可以得到25个点,能看到所有在公共聊天室里生成的图片。随后,你必须付费,10美元或者30美元每月,取决于你想要制作的图片数量,不管它们是否属于你私人所有。 不过,Midjourney最近正在对其模式进行扩张,允许任何人使用自己的AI图像生成器创建自己的Discord服务器。“我们正在从Midjourney宇宙走向Midjourney多元宇宙”,Holz认为这个结果将是令人不可思议的:AI增强创意大量涌现,但这还只是冰山一角。 在接受外媒TheVerge采访时,Holz谈到了他对于Midjourney的雄心壮志,包括打造“想象力引擎”的原因,以及为何认为AI更像水而非猛虎。 以下是Gamelook编译的完整采访内容: Q:首先介绍一下你自己和Midjourney,你的背景是什么、怎么进入这个领域的?Midjourney是什么?一个公司还是一个社区?你如何描述它? David Holz:我的名字是David Holz,是个连续创业者。简短来说,我的经历是:我在高中的时候做过设计生意,大学学的是物理和数学,我在NASA和Max Planck工作的时候攻读了流体力学博士学位。有一次我不知所措,把所有的事情都放在一边。所以我搬到了旧金山,在2011年左右创办了一家名为Leap Motion的科技公司。我们销售那些可以在手上做动态捕捉的设备,创造了很多手势界面空间。 我创立了Leap Motion并且经营了12年,(但)最终,我希望寻找一个不同的环境而不是一家风投支持的大公司,我离开并创办了Midjourney。现在,它的规模还很小,我们只有10个人,没有投资者,而且我们也不是以财务回报为动机的。我们没有压力销售什么东西也不用成为上市公司,只是为了未来十年能够有一个家,可以从事很多比较有意义的、酷的项目,希望不仅是对我,还对世界有意义,并且享受乐趣。
We're working on a lot of different projects, and it's going to be a broad and diverse lab. But we have themes: reflexes, imagination and collaboration. We started to become known for this kind of graphic creation, and we didn't think it was really about art or doing deepfake, but, how do we expand the human imagination? What does that mean? What does it mean when computers are better at visualizing than 99% of humans? This does not mean that we will stop imagining. Cars are faster than humans, but that doesn't mean we've stopped walking. We need engines when we're transporting large commodities over long distances, whether it's planes, ships or cars. We see this technology as an engine of imagination, so it's a very positive and human thing to do.
In 10 years, you'll be able to buy an Xbox with a giant AI processor, and all the games will be created out of dreams
Q: There are lots of LABS and companies working on turning words into images. Google has Imagen, OpenAI has Dall-E, and there are small projects like Craiyon. Where did this technology come from, where do you see it going in the future, and how does Midjourney's vision differ from others in this space?
David Holz: So, there were two major breakthroughs in AI that led to the emergence of image generation tools. One was understanding language and the other was the ability to create images. When you combine these two things, you can create images from the understanding of language. We're seeing these technologies emerge, we're seeing these trends, and these technologies are going to be better at making images than humans, and they're going to be very fast. In the next year or two, you'll be able to make content in real time, at 30fps, at high resolution. It will be expensive, but it will be possible. Then, in ten years, you'll be able to buy an Xbox with a giant AI processor, and all the games are a dream.
From the point of view of the original technology, these are just facts and there is no way around it. But what does this really mean from a human perspective? What does it mean to say "all games are dreams, everything is malleable, we're going to have AR headsets"? So the human aspect of this is unfathomable, and the fact that this software gets to the point where we can actually use it, it just doesn't exist, and we think that's our focus.
We started testing the original technology last September and immediately noticed something different. We soon discovered that most people didn't know what they wanted. You say, "Here's a machine. You can imagine anything with it. What do you want?" They would say, "dog." You then ask, "Really?" "Pink dogs," they answered. So, you give them a picture of a puppy, they get the picture, and they go on to do something else.
However, if you put them in a group, they would say "dog", someone else would say "space dog", and another person might say "Aztec Space dog". And then, all of a sudden, people understand the possibilities, and you're creating this enhanced imagination, an environment where people can learn and exercise this new ability. So what we found is that people really like to imagine together, so we made Midjourney social, and we have a huge Discord community, one of the largest Discord communities, with over 1.4 million people imagining things together in these shared Spaces.
我们在从事很多不同的项目,它将是一个广泛而多元的实验室。但我们是有主题的:比如反射、想象力和协作。我们开始以这种图片创造的东西而出名,我们不认为这真的是有关艺术或者做deepfake,而是,我们如何拓展人类的想象力?那意味着什么?当计算机比99%的人类都更擅长视觉想象的时候意味着什么?这并不意味着我们将停止想象。汽车比人类更快,但并不意味着我们就停止了行走,当我们在长途运输大宗商品的时候,我们需要引擎,无论是飞机、轮船还是汽车。我们将这项技术视为想象力的引擎,所以这是一个非常积极而且人性化的事情。 十年内,你就能买一台带有巨大AI处理器的Xbox,所有的游戏都是用梦想创作 Q:很多实验室和公司都在从事将文字转化为图片这样的工作,谷歌有Imagen、OpenAI有DALL-E,还有一些像Craiyon这样的小项目。这个技术从哪里来的、你们认为未来它将走向何方,还有,Midjourney在这个领域的愿景与其他人有什么不同? David Holz:所以,AI领域有两个重大突破导致了图片生成工具的出现,一个是理解语言,另一个是创作图片的能力。当你将这两件事结合起来的时候,你可以通过对语言的理解创作图片。我们看到了这些技术的出现,看到了这些趋势,这些技术将比人类更擅长制作图像,而且速度会非常快。未来一两年内,你可以实时制作内容,而且是30fps、高分辨率内容。这将是昂贵的,但它会是可能的。然后,在十年内,你将能购买一台带有巨大AI处理器的Xbox,所有的游戏都是梦想。 从原始技术的角度来看,这些只是事实,是没有办法回避的。但从人类的视角来看,这到底意味着什么?“所有游戏都是梦想,一切都是可延展的,我们将有AR头盔”这到底意味着什么?所以这其中的人文因素是深不可测的,而且这个软件达到我们真正可以使用的地步,这是完全不存在的,我们认为这是我们的焦点。 我们在去年9月开始测试原始技术,并立即发现了不同的事情。我们很快发现大多数人并不知道他们想要的是什么。你说,“这里有一台机器,你可以用它想象任何事物,你想要什么?”他们会说,“狗”。你接着问“真的吗?”他们回答“粉色的狗”。所以,你给他们一张小狗的照片,他们会收到图片,然后再去做其他事。 然而,如果你把他们放在一个小组中,他们会说“狗”,其他人会说“太空狗”,另外一个人可能会说“Aztec太空狗”。然后,突然之间,人们理解了各种可能性,你正在创造这种增强的想象,一个让人们可以学习和发挥这种新能力的环境。所以我们发现,人们真的非常喜欢共同想象,因此我们将Midjourney做成了社交,我们有一个巨大的Discord社区,最大的Discord社区之一,有超过140万人在这些共享空间当中一起想象事物。
Q: Do you think human groups are parallel to machine groups? As a counterbalance to these AI systems?
David Holz: However, there's no real group of machines, and every time you ask the AI to make a picture, it doesn't really remember or know anything about what it's done. It has no will, no goal, no inclination, no ability to tell a story, all the self, will and story, is ours. It's like an engine, and the engine doesn't go anywhere, but people have places to go, and it's kind of like a human hive mind, super tech powered.
Within the community, there are millions of people making images, and they are copying each other, and by default, everyone can see everyone else's image. You have to pay extra to get out of the community, and often, if you do, it often means you're some type of business user. So everybody's copying each other, and there's all these new aesthetics, almost like aesthetic accelerationism, they're spinning around, and they're not AI aesthetics, but they're new, interesting human aesthetics that I think will spread all over the world.
Q: Does this openness also help ensure security? Because there's been a lot of talk about AI image generators being used to generate things that can be harmful, whether it's visually annoying images like blood and violence, or error messages, how do you stop that from happening?
David Holz: Yeah, it's amazing, when you put someone's name on all the pictures they make, they become more rigorous about how they use them, and that helps a lot.
In spite of this, we still have some problems, unfortunately, like social media work in other places, you can trigger angry to make a living, some people in the community is motivated, pay for privacy, and then spend a month's time to try to create the most amazing images is anger and terror, and then try to post on social media. And then we have to stop these things and say, "This is not what we're about, this is not the type of community we want."
Every time we see it, we clear it out. Where necessary, we set sensitive words, we collect words like hyperrealism, and we ban anything close to it.
Q: Well, what about real faces? Since this is another vehicle for misinformation, will the model produce real faces?
David Holz: It generates famous faces or something like that. But we usually don't, we have a default style and look that is artistic and beautiful, and it's hard to push the model away from that, meaning you can't force it to make deepfake now. Maybe you can spend 100 hours trying and find the right combination of words to make it look real, but you have to work hard to make it look like a photo. Personally, I don't think the world needs more deepfakes, but it needs more beautiful things, so we focus on making some beautiful and artistic.
Q:你认为人类群体与机器群体并行吗?作为这些AI系统的一种平衡? David Holz:然而,并没有真正的机器群体,每次当你要求AI做一个图片的时候,它都不会真正记得或者知道曾经做过的任何事。它没有意志、没有目标、没有倾向、没有讲故事的能力,所有的自我、意志和故事,都是我们的。它就像是一个引擎,引擎哪里都不会去,但人们有地方要走,这有点像人的蜂巢思维,有超级技术动力。 在社区内,有数百万人制作图像,他们在相互模仿,默认情况下,每个人都可以看到其他人的图像。你必须支付额外的费用才能离开社区,通常,如果这样做,往往意味着你是某种类型的商业用户。所以每个人都在相互模仿,这里有所有的新美学,几乎像是审美加速主义,它们在不停的旋转,它们不是AI美学,而是新的、有趣的人类美学,我认为会传播到全世界。 Q:这种开放性是否也有助于确保安全?因为有很多讨论谈到,AI图像生成器被用来生成可能有害的东西,无论是直观上令人讨厌的图像,比如血腥和暴力,还是错误信息,你们如何阻止这种事请的发生? David Holz:是的,这是令人惊奇的,当你把某人的名字写在他们制作的所有图片上,他们会对如何使用这些图片更加严谨,这有很大的帮助。 尽管如此,我们仍有一些问题,不幸的是,像社交媒体在其他地方工作的方式,你可以引发愤怒来谋生,有些人进入社区是有动机的,为隐私付费,然后花一个月的时间尝试创造最令人愤怒和恐怖惊人的图像,然后试图在社交媒体上发布。然后,我们必须阻止这些事,说,“这不是我们的目的,这不是我们想要的社区类型。” 我们每次看到,就会把它清理出去。必要的情况下,我们会设置敏感词,我们收集了像真实感超高这样的词汇,我们封禁了与之相近的所有词汇。 Q:那么,真实的面孔呢?因为这是另一个制造错误信息的载体,这个模型是否会生成真实的面孔? David Holz:它会生成名人面孔或者类似的东西。但我们通常不会,我们有一个默认的风格和外观,它是具有艺术性的和美丽的,很难将模型推离这一点,意味着你现在不能强迫它制作deepfake。或许你投入100个小时尝试,可以找到一些正确的词汇组合让它看起来很真实,但你必须努力让它看起来像一张照片。就我个人而言,我不为人世界需要更多的deepfake,但它需要更多美丽的东西,所以我们专注于让一些变得美丽和具有艺术感。
There are only 20 or so models trained in the whole field, so this is experimental science
Q: Where did you find the training data for the model?
David Holz: Our training data is basically the same as everybody else's, and a lot of it comes from the Internet. Almost every large AI model extracts all available data, text, and images. Scientifically speaking, when we were in the early stages of this field, everyone would grab everything they could, put it in a huge file, and use it to train something huge, but no one really knew what was really important in the data.
For example, our recent update made some look a lot better, and you might think we did that by pouring a lot of painting into the training data. But we didn't do that, we just used user data based on what people liked to do with the model, no human investment in art. But scientifically, we're at a very early stage, and there are only a few dozen models being trained in the whole field, so this is experimental science.
Q: How much does your training cost?
David Holz: I would say that training models in this area, you can't talk about specific costs, but you can talk about things in general. Now the cost of training an image model is about $50,000 per time, and you never get it right on one try, so you have to do it three or ten or twenty times, and you do need a lot of times. That's the result. It's expensive, more than most universities can afford, but not so expensive as to require a billion dollars or a supercomputer.
I'm pretty sure the training and running costs will come down, but it's actually very expensive to run. Each image costs money, and each image is generated on a $20,000 server that we have to rent by the minute. I don't think there's ever been a service for consumers that uses tens of billions of operations in 15 minutes without thinking about it. Maybe 10 times. I think it's more computing power than anything the average consumer has ever touched, which is actually kind of crazy.
Q: When it comes to training data, one controversial aspect here is the issue of ownership. Current US law states that AI-generated art cannot be copyrighted, but it is less clear that people can copyright images used in training data. Artists and designers have worked hard to develop a particular style, but what happens if their work can now be replicated by an AI robot? Did you have any discussion about it?
David Holz: There are a lot of artists in our community, and it's fair to say that their overall response to this tool is positive, that it will make them more productive and improve their lives dramatically. We keep talking to them and asking, "Are you okay? Do you find this tool useful?" We also do it during office hours, and I talk to 100 people for four hours on voice mail, just answering their questions.
A lot of well-known artists who use the platform, they all say the same thing, it's fun. They said, "I think Midjourney is an art student, it has its own style, and when you use my name to create an image, it's like asking an art student to create something inspired by my art. In general, as an artist, I want people to be inspired by what I do."
整个领域只训练了20多个模型,所以这还是实验型科学 Q:你们是从哪里为模型找到训练数据的? David Holz:我们的训练数据基本上和所有人都一样,很大程度上来自于互联网。几乎每一个大型的AI模型都会提取所有能得到的数据、文本和图像。从科学角度来说,我们在这个领域的早期阶段,每个人都会抓取所有能够抓到的东西,把它们放在一个巨大的文件中,然后用它们来训练一些巨大的东西,但没有人真正知道这对数据中有什么事真正重要的。 例如,我们最近的更新让一些变得看起来好很多,你可能会觉得我们是通过向训练数据投入大量绘画完成的。但我们没有那么做,我们只是基于人们喜欢用这个模型做的事情使用了用户数据,没有人为美术的投入。但科学上来说,我们还处于非常早期的阶段,整个领域还只有几十个模型在训练,所以这是实验型科学。 Q:你们的训练成本是多少? David Holz:我想说的是,在这个领域训练模型,不能说到具体成本,但可以说一般情况下的事情。现在每次图像模型的训练成本大概是5万美元,你永远不会一次尝试就得到正确结果,所以必须尝试3次、10次或者20次,而且你的确需要很多次尝试。这就是结果,它很昂贵,超出了大多数大学的可以承担的支出,但也不会贵到需要十亿美元或者一台超级计算机。 我很确定的是,训练和运行成本都会降下来,但它运行的成本实际上非常高,每张图片都需要钱,每个图片都是在一台2万美元的服务器上生成,我们必须按分钟租借这些服务器。我认为从来没有一项这样的服务是为消费者提供的,不考虑这些的情况下,它们在15分钟内使用了数千万亿次次的操作。可能需要10倍,我认为它比一般消费者接触过的任何东西都更具计算能力,这实际上有些疯狂。 Q:谈到训练数据,这里有一个有争议的方面是所有权的问题。美国现行法律规定,不能对AI生成的艺术进行版权保护,但我们不太清楚人们是否可以对训练数据中使用的图像进行版权保护。艺术家和设计师们努力形成一种特定的风格,但如果他们的作品现在可以倍AI机器人复制会发生什么?你们对这件事有过讨论吗? David Holz:我们社区里有很多的艺术家,可以说他们整体对这个工具的评价是正向的,他们认为这会让他们有更高的生产效率,并大幅改善他们的生活。我们不断地与他们聊天并且询问,“你们还好吗?是否觉得这个工具好用?”我们也会在办公时间,我和100人通过语音聊4个小时,只是回答他们的问题。 很多使用该平台的知名艺术家们,他们都说了同样的话,就是这个工具很有趣。他们说,“我觉得Midjourney是一名艺术生,它有自己的风格,当你引用我的名字创作一张图片的时候,就像是要求一名艺术生通过我的艺术作品启发创作一些东西。总的来说,作为一名艺术家,我想要让人们通过我做的东西获得灵感。”
Q: But there must be a huge self-selection bias, because the artists who are active on the Midjourney Discord server are definitely the ones who are excited about it. What about the people who say, "This is nonsense, I don't want my art to be swallowed up by these giant machines"? Would you allow these people to remove their work from your system?
David Holz: We don't have that process yet, but we're keeping it open. So far, I would say that there aren't that many artists in the community, there aren't that many deep data sets. Artists who work in our models keep giving us responses like, "We're not really scared by this." Now, it's so new that I think it makes sense to use ears and dynamics, so we talk to people a lot. In fact, one of the biggest requests we get from artists these days is that they want to steal their style better so they can use models as part of their artistic process, which I find surprising.
Other [AI graphics] generators might be a little different because they try to create something that looks the same, but we have our default style, so the resulting work looks like something an art student would create inspired by someone else. The reason we do this is because you always have a default style, if you say "dog," we give you a picture of a dog, which is kind of boring. From a human point of view, why do you want it? Just go to a search engine and search for images, so we wanted to do something that looked artistic.
Q: This is the default style of Midjourney that you mentioned several times in our conversation, and I'm interested in the idea that each AI image generator has its own cultural microcosm, its own preferences and its own way of expressing it. How would you describe Midjourney's unique style, and how you deliberately developed it?
David Holz :(laughs) that's a bit of a pertinent question. We tried a lot of things, rendering a thousand images every time we tried something new, and there was no real intention, it was supposed to look beautiful and respond to concrete things and vague things. We definitely don't want it to look like a photo, we might make a realistic version at some point, but we don't want it to be the default. Perfect photos make me a little uncomfortable right now, but I can see your legitimate reasons for wanting something more authentic.
I think the style can be somewhat whimsical, abstract and weird, and it tends to mix in ways you might not ask for, achieved in surprising and beautiful ways. It tends to use a lot of blue and orange, has a few favorite colors and faces, and if you give it a vague command, it will pick the one it likes best. So we don't know why this happens, but it likes to talk about a particular woman's face, and we don't know which one of the 12 training datasets she's from, but people just call it "Miss Journey." There was also a man's face, somewhat square and imposing, who also appeared, but did not yet have a name, but it was like an artist with his own face and preferred colors.
Q: Speaking of these defaults, one of the big issues in image generation that has to be dealt with is bias. There are studies that show that if you ask an AI image model to draw a CEO, the CEO will always be a white male, and when you ask it to draw a nurse, the nurse will always be a woman, usually with colored skin. How do you deal with this challenge? Is this a significant issue for Midjourney, or is it more of a concern for companies looking to monetize these systems?
David Holz: Well, Miss Journey is more of an issue than a feature, and we're working on some things right now that try to break down those profiles and give you more variety. But there are downsides to that, like we have a version that completely breaks Miss Journey, but if you really want to, let's say have Schwarzenegger play Danny DeVito, it also undermines that request. The tricky part is making it work without eliminating all expression types. Because it's easy to add a switch to increase diversity, but it's hard to get it to turn on at the right time.
All I can say is that whatever variety you want, creating an image has never been so easy to turn over, you just need to use one word. There's always one word between you and the creation. For example, I used "African Cyberpunk wizard" for my creation. It looks beautiful and cool, and I only need one word to tell the model what you want.
Q:但肯定会存在巨大的自我选择偏见,因为活跃在Midjourney Discord服务器的艺术家肯定是对它感到兴奋的人。那些说,“这是胡说八道,我不想让我的艺术被这些巨大的机器所吞噬”的人怎么办?你会允许这些人将自己的作品从你们的系统中移除吗? David Holz:我们还没有这样的流程,但我们对这个问题保持开放。目前为止,我想说的是社区里并没有那么多的艺术家,还没有那么深度的数据集。那些在我们模型里创作的艺术家一直在给我们这样的回应:“我们真的没有被这个吓到”。现在,它太新了,我认为用耳朵和动态是有意义的,所以我们和人们经常交谈。实际上,我们现在从艺术家们那里得到最重要的一个要求就是,它们希望更好地窃取他们的风格,这样他们就可以将模型用作他们艺术创作流程的一部分,这让我感到惊讶。 其他(AI图像)生成器可能有些不同,因为他们试图创作一些看起来相同的东西,但我们有自己默认的风格,所以生成的作品看起来就像是一个艺术生得到其他人启发创作的内容。我们做这件事的原因是,你总有默认风格,如果你说了“狗”,我们就给你一张狗狗的照片,这有些枯燥。从一个人类角度来说,你为什么想要它?直接到搜索引擎搜图片就可以了,所以我们希望做一些看起来具有艺术性的东西。 Q:这是你在我们对话中多次提到的,Midjourney的默认风格,我对于每个AI图像生成器都有自己的文化缩影、有自己的偏好和表达方式这样的想法很感兴趣。你如何形容Midjourney的独特风格,以及你们是如何有意发展它的? David Holz:(笑)这个问题有点针对性。我们尝试了很多东西,每次尝试新东西的时候都会渲染一千个图像,而且没有真正的意图,它应该看起来很漂亮,对具体事物和模糊事物有所反应。我们绝对不希望它看起来像照片,我们可能会在某个节点制作一个现实版本,但我们不希望成为默认版本。完美的照片现在让我有点不舒服,不过我可以看到你想要更真实东西的正当理由。 我认为这种风格会有些异想天开、抽象和怪异,它倾向于你可能不会要求的方式混合,以令人惊讶和美丽的方式实现。它倾向于使用大量的蓝色和橘色,有一些喜欢的颜色和面孔,如果你给出一个比较模糊的指令,它会选择自己最喜欢的那个。所以我们不知道为什么会发生这种情况,但它喜欢话一张特别的女人脸,我们不知道她来自12个训练数据集中的哪一个,但人们只是称之为“Miss Journey”。还有一张男人脸,有些方正、很有气派,他也出现过,但还没有名字,但这就像是一个有着自己面孔和偏好颜色的艺术家。 Q:谈到这些默认的东西,图片生成领域一个需要处理的很大问题就是偏见。有研究表明,如果你让AI图像模型画一个CEO,这个CEO永远都是白人男性,当你让它输出一名护士,护士永远是一个女性,通常还是有色皮肤。你们如何处理这种挑战?这对于Midjourney来说是一个重大问题,还是说那些希望将这些系统进行变现的公司更担忧? David Holz:好吧,Miss Journey更多的是一个问题而非一个功能,我们现在正在做一些事情,试图打破这些脸谱化,给你更多的变化。但这也有不利之处,比如我们有一个版本完全打破了Miss Journey,但如果你真的想,假设让施瓦辛格扮演Danny DeVito,它也会破坏这个请求。比较棘手的是,在不消除所有表达类型的情况下让它发挥作用。因为,加一个开关提高多样性很容易,但想要让它在恰当的时候打开却很难。 我能说的是,无论你想要什么多样性,创造一个图像从来没有这么容易改过,你只需要用一个词。你和创作之间总是只有一个词汇的距离,比如,我曾用过“非洲赛博朋克巫师”进行创作,它看起来很漂亮、很酷,我只需要一个词就可以告诉模型你想要什么。
People totally misunderstand what AI is
Q: You've talked a lot about the work you've done on Midjourney, which is unrealistic, to say the least. I mean, of course they're practical, but your motivation is more abstract, about the relationship between humans and AI and how we use AI in a human way. Some people in the AI field tend to view the technology in the grandest terms, likening it to a god. How do you feel about that?
David Holz: There was a time when I was trying to figure out, "What is Midjourney's AI graphics generator?" Question, because you could say he's like an imagination engine, but there's something else going on here. The first temptation is to look at it from an artistic point of view and ask: Was this the invention of photography? Because painting seemed strange when photographs were invented, and anyone could take a picture of a face, so why draw it now?
Is that right? No, it's definitely not. It must be even weirder. Now, it feels more like the invention of an engine: you can make tons of images every minute, and you're galloping down the road of imagination, and it feels good. But if you go a step further and instead of doing four images at a time, you do 1,000 or 10,000 images, it's different. One day, I did it, and I made 40,000 images in a few minutes, and suddenly there was this vast expanse of nature in front of me, all these different creatures and environments, and it took me four hours to look at them, and the process was like drowning. I feel like a little kid, looking into the depths of the pool, knowing that I can't swim, and knowing the depth of the pool. Suddenly, the Midjourney didn't feel like an engine, it felt like a jet stream. I worked on it for a few weeks, and I thought about it and thought about it, and I realized that it was actually water.
Now, people completely misunderstand what AI is. They see AI as a tiger that is dangerous and could eat me. It is an adversary. There are dangers in the water, you can drown, but there is a very different danger in a fast-moving river than there is in a tiger, there is danger in the water, yes, but you can swim in it, you can take a boat, you can dam it to make electricity. Water is dangerous, but it is also the driving force of civilization, and there is a chance that we, as human beings who guide how to live and work with water, will live better. Water has no will, no grudge, yes, you can drown, but that doesn't mean we should ban the existence of water, when you find a new source of water, it's really a good thing.
Q: So Midjourney is a new water source?
David Holz: Yeah, it's kind of scary when you say it that way.
I think that together, as a species, we've discovered a new water source, and what Midjourney is trying to figure out is, how do we use this water for humans? How do we teach people to swim in it, how to make boats? How do we build embankments and go from being afraid of drowning to being the surfing kids of the future? We're doing surfboards instead of water, and I think that has profound implications.
人们完全误解了AI是什么 Q:退一步说,你已经谈到很多在Midjourney当中做的工作,可以说是不切实际的。我意思是说,他们当然是实际的,但你们的动机更加抽象,关于人类与AI的关系、我们如何用人类的方式使用AI。AI领域的一些人倾向于以最宏大的角度看待这项技术,他们把它比作神祇,你对此有何感想? David Holz:曾几何时,我一直都在试图搞清楚“什么事Midjourney的AI图像生成器?”这个问题,因为你可以说他就像是一个想象力引擎,但这里还有其他东西。第一个诱惑是从艺术的角度来看待它,问:这是摄影的发明吗?因为当照片发明的时候,绘画就看起来很奇怪,任何人都可以拍摄一张脸的照片,所以现在为什么要画出来呢? 是这样吗?不,肯定不是,这一定更加奇怪。现在,它给人的感觉更像是一个引擎的发明:你每分钟都可以做大量的图像,你在想象的道路上奔腾,这感觉很好。但如果你再向前一步,不是一次做四张图片,而是做1000或者1万张,那就不一样了。有一天,我这么做了,我用几分钟做了4万张图片,突然间,我面前出现了如此广阔的自然,所有这些不同的生物和环境,我用了4小时才看完它们,这个过程就像是溺水。我觉得自己是个小孩子,看着游泳池的深处,知道自己不会游泳,而且知道了泳池的深度。突然之间,Midjourney给人的感觉不像是引擎,而是一股急流。我用了几个星期的时间来处理,我想了又想,我意识到它实际上就是水。 现在,人们完全误解了AI是什么。他们将AI视为猛虎,是危险的,可能会吃掉我,它是一个对手。水里也有危险,你可能会溺水,但湍急的河水与老虎的危险是截然不同的,水有危险,没错,但你可以在里面游泳、可以坐船,可以用水坝拦住发电。水是危险的,但它也是文明的驱动力,我们作为指导如何与水一起生活和工作的人类,会活的更好,这是一个机会。水没有意志、没有怨恨,是的,你可能溺水,但这并不意味着我们应该禁止水的存在,当你发现一个新的水源,这真是一件好事。 Q:那么Midjourney是一个新水源? David Holz:是的,当你用这种方式说出来的时候有些吓人。 我认为,作为一个物种,我们共同发现了一种新水源,Midjourney试图搞明白的是,我们如何为人类使用这种水源?我们如何教人们在其中游泳、怎么造船?我们如何筑堤、如何从害怕溺水的人变成未来冲浪的孩子?我们在做的是冲浪板而不是水,我认为这有着深刻的意义。
|