HIROYUKI HAYASHI - Study MOV using generative AI

THE EMPIRE COMING!
ほぼ、1年のブランクを空けて、ミッドジャーニーのstill2video 機能をテスト。一枚の生成AI画像から動画を作る処理は素晴らしい。開発期間も長かったがそれだけの事はあるように思う。描かれたモチーフの物理的な認識優れているのだろう。価格も良心的。一方、状況といえば、「IP帝国」に訴えられている。負ければ現状のコストでこのクオリティを提供するのは難しいだろう。核兵器と似通った顛末。

Almost, after a year blank, tested the still2video feature in Mid Journey. The process of creating a video from a single generated AI image is excellent. It took a long time to develop, but I think it's worth it. The physical recognition of the drawn motifs must be excellent. The price is also reasonable. On the other hand, speaking of the situation, they are being sued by an “IP empire. If they lose, it will be difficult to offer this quality at the current cost. The end result is similar to that of nuclear weapons.

2025

Succulent intrigue
RANWAYのGen3Alphaでの1枚の写真から生成。
プロンプトは期待通りには機能しないが、興味深く、意外性がある。
高い生成クオリティを持っているのに、作者が望む方向性をテキストで繰り返し探さなければならないのは、ツールとして使うにはイラつくかもしれない。
現状、多くのAIビジュアル生成と同じように、完成したアウトプットからインスピレーションを得る方が健康的だ。

Generated from a single photo at Runway's Gen3Alpha. The prompts don't work as expected, but they are interesting and surprising. While it has a high quality of generation, it can be frustrating to use as a tool to have to repeatedly look for the author's desired direction in the text. Currently, as with many AI generators, it is healthier to draw inspiration from the finished output.

KLING. Yahoo.

Midjorneyで生成した、既存のスティル画像を片っ端から突っ込んでみる。
アップロードした画像の行先は少し心配なので、色々な意味で大丈夫な画像で試している。まだ、「Image2Video」しか試していない。
上手くイメージ通りに生成できる素材もあるけれど、全く言う事を聞いてくれない素材もある。
KLINGが想定した学習内容に、素材が収まっている分には（彼女の想像の出来る範囲では？）器用な結果を出力してくれる。
ただ、追加プロンプトの解析が上手くいかないと、いきなり、別の素材をカットで、ぶっ込んできたりする。
数点しか試していないけれど、「ペイントで表現された人物」において、オリエンタルに偏りがち。
また、大きな問題がある。
最初の10トライぐらいは一点、約5分ぐらいで処理が終わったのだがそこから後、1トライ、最低4時間は処理がかかっている。
99%完了からあと動かない。6時間以上かかっているトライアルもある。
入力した画像データに問題がるのかシステムとの相性かわからない。
ビジネス的に意図的に遅くしているのか、はたまた日本からのアクセスに問題が有るのか？
妙な疑念を抱いてしまう。
とはいえ、一般人が手を出せる環境は感謝したい。
ありがとう。中国。アメリカと中国からの「サブスク」で日本は消滅するかもしれない。
なんとかしなさい、政治家。

Experiment with the KLING system. Yahoo!
I'm going to try plugging in every existing still image generated by Midjorney.
I'm a little worried about the destination of the uploaded images, so I'm trying with images that are safe in many ways.
I have only tried "Image2Video" yet.
Some of the images are generated as I imagined, but some of them don't listen to me at all.
As long as the material fits into what KLING expects to learn (as far as she can imagine?), she is able to get the best out of the system. KLING is very dexterous in its output.
However, if the additional prompts are not parsed properly, however, the KLING system will suddenly cut another piece of material（Unknown chinese people） into the study.
Also, there is a big problem. the first 10 or so tries took about 5 minutes per item to process.
After that, it took at least 6 hours per trial. 99% of the trials were completed, and the rest did not move.
Is there a problem with the input image data?
I don't know if it's a compatibility issue with the system or not.
Is it a business problem, or is there a problem with access from Japan?
I have a strange suspicion.
Nevertheless, I am grateful for an environment where the general public can access the site.
Thank you, China.
Japan may disappear with subscriptions from the US and China.
Do something about it, politicians !

2024

The nightmare

MorphStudioというサイト。基本的にはText2Video。ほとんど同じプロンプトで生成するも、生成結果の振り幅が大きい。面白いけれども。Image2Videoもかなり自然に生成するらしい。また試してみようと思う。

A site called MorphStudio. Basically Text2Video. Even though they are generated using almost the same prompt, the generated results vary greatly. Interesting though. Image2Video also seems to be generated quite naturally. I think I'll try it again.

AI Delirium

即興で舞ってもらったクリップをRanwayのGen1:Video2Videoで生成。いろいろなテイストを遊んでみた。
当たり前っちゃ、当たり前だけど、動きのエッセンスは紛れもなくMs.IrOhaだったりする。
Ranwayのサブスクリプションは結構高価に感じる。個性的な処理が楽しめるけれど。
何をゴールに据えるかだけれど、Gen2の新しいText2Videoがかなりカット制作志向のツール。また、いずれ。

Video clips of improvised dancing were generated using "Ranway"'s Gen1:Video2Video. I tried playing with various tastes.
It may seem obvious, but the essence of the movement is unmistakably Ms.IrOha's.
Runway subscriptions feel expensive. Although you can enjoy the unique processing.
It depends on what your goal is, but Gen2's new Text2Video is a tool that is very cut-oriented. Also, chair.

Alien meeting

小芝居シリーズ。フローはUKに準ずるが「DI-D」の出力は入力のレゾリューションを上げてもHDに満たない。Ai処理の「VIDEO-ENHANCER.media.io」を使ってみた。通常のアップスケールよりも綺麗に上がる。４倍まで可能らしいが、サブスクプランの都合でHD24fにして、その後AEで画像処理を入れた。カッティングの後、UpScaleしたがカットの間でも特に問題なさそう。テロップは最後にPrで。

A series of small plays. The flow is similar to the UK, but the output of "DI-D" is less than HD even if the input is set to 4K resolution. I tried using Ai processing's "VIDEO-ENHANCER.media.io". It rises more beautifully than normal upscaling. It seems that up to 4K is possible, but due to the subscription plan, I changed "960*540" to "1080P24f" and then added image processing with AE. I used UpScale after the editing process, and there seems to be no problem between cuts. The telop is Pr in the final stage.

UK

基本のシナリオは自作だが翻訳は「ChatGPT」にお願いした。
発話は「elevenlabs.io」。「D-ID」が「Midjourney」のキャラクタを喋らせてくれる。「leiapix.com」でラフな1枚のdepthレイヤーを出したら、「Adobe」Aeで視差をちょっとつける。Ps,Prでちょっと調整。
このフローが最近、気に入っている。キャラクタが人を小馬鹿にしている感じの空気感が好き。

The basic scenario was made by myself, but I asked "ChatGPT" for the translation.
The utterance is "elevenlabs.io". "D-ID" makes the character of "Midjourney" speak. After creating a rough depth layer using ”leiapix.com”, I added a bit of parallax using Adobe Ae. A little adjustment with Ps and Pr.
I've been loving this flow lately. I like the atmosphere where the characters make fun of people.

AI:Diverse Distances

小さな実験を兼ねたArt系作品を集めた。音楽は「beatoven.ai」この時は比較的イメージに合った。SEを発生させてくれるツールには出会えず。
「Midjourney」に喋ってもらったのも、このクリップが最初。ちょっとイライラした娘を当ててみた。

A collection of Art-type works that also serve as small experiments. The music is "beatoven.ai", which at this time was relatively suitable for my image. I can't find a tool that causes SE.
This clip is also the first time I asked "Midjourney" to speak. She guessed the girl who was a little annoyed.

Fluffy

一時、「GoogleColab」で「StableDiffusion」を動かしてみた。woolitize(毛糸)に特化したモデルが面白かったので数枚、描いて見た。フワフワ感が可愛い。Runway2で動かして無理やりクリップにした。

Temporarily, I tried running "StableDiffusion" on "GoogleColab". A model specialized in woolitize was interesting, so I drew a few. Fluffy and cute. I forced it to clip by moving it on Runway2.

PowerGirl

Animeっぽい絵で喋らせて見た。「D-ID」のツールは正面を向いていると上手くいくが、少し、表情のある角度だと補完が破綻することを学習。顔の歪みをメッシュワープするしか無い。あんまり頭のパーツが大きいと頭の一部と認識しない。

I made it talk using anime-like pictures. The "D-ID" tool works well when facing the front, but learns that complementation breaks down at an angle with a slight expression. There is no choice but to mesh warp the distortion of the face. If the head part is too big, it will not be recognized as part of the head.

2023

You may also like