LTX-2 is Lightricks' audio-video generative model — distinct from LTX Video 2.3, which handles video-only generation. LTX-2 produces synchronized audio-video output from text prompts, with LoRA training support for fine-tuning on custom styles, motions, characters, and visual effects. The model is open-weight, with training code available in the official Lightricks GitHub repository.
For teams who want to train custom LTX-2 LoRAs without setting up local infrastructure, a small number of online trainers now offer this capability. This guide covers what's available in 2026 for online LTX-2 video LoRA training, what each service provides, and how to evaluate options for your use case.
LTX-2 vs. LTX Video 2.3: Understanding the Distinction
LTX Video 2.3 (LTXV 2.3) is Lightricks' dedicated video generation model — text and image to video, no audio component. It has been widely deployed via fal.ai, Replicate, and other inference APIs, and has a mature LoRA training ecosystem with documented endpoints for fine-tuning on custom motion, character, and style data.
LTX-2 is a newer model that adds audio-video joint generation: it produces video with synchronized audio from text prompts, and supports LoRA training for customizing its generation behavior across both modalities simultaneously. LTX-2 is available as a 19B parameter model and is currently in active development — the LTX-2 package is maintained separately from the LTXV 2.3 codebase.
For LoRA training purposes: if your goal is to fine-tune for video style, motion, or character without audio requirements, LTX Video 2.3 has a more mature and widely deployed training ecosystem. If you need audio-video synchronization in your generated output — custom sonic environments, motion-audio correlation, or audio-driven visual effects — LTX-2 is the relevant model.
Online LTX-2 LoRA Training Services (2026)
WaveSpeedAI
WaveSpeedAI offers the most developed consumer-facing LTX-2 LoRA training interface currently available. Their LTX-2 19B Video-LoRA Trainer accepts a ZIP file of training videos with optional audio tracks and produces a custom LoRA adapter for the LTX-2 model. A separate IC-LoRA trainer handles image-conditioned video-to-video transformation training. WaveSpeedAI also offers an LTX-2 LoRA fine-tuning guide covering training data requirements, parameter configuration, and output evaluation.
WaveSpeedAI's interface is designed for practitioners comfortable with training parameters — LoRA rank, training steps, learning rate, and dataset configuration are exposed. It is not a fully guided no-code experience but provides more user-facing documentation than raw API access.
fal.ai LTX-2 Video Trainer
fal.ai hosts an LTX-2 Video Trainer endpoint with a user guide and API documentation. The trainer accepts uploaded video datasets and returns a trained LoRA adapter. fal.ai's approach is API-first: the primary interface is programmatic, with a dashboard available for monitoring training jobs. For developers building training pipelines or testing fine-tuned models via API, fal.ai is well-documented and reliable. For non-technical users who want a guided no-code experience, the interface requires more familiarity with training concepts than a true wizard-style trainer.
ComfyUI LTX-2 Trainer (jaimitoes/ComfyUI-LTX2-TRAINER)
An open-source ComfyUI custom node set for LTX-2 LoRA training. Not an online hosted service — requires a local ComfyUI installation with adequate GPU resources. Relevant for teams with existing ComfyUI infrastructure who want training workflow integration. Requires significant local setup.
Official Lightricks Training Package
The official LTX-2 repository on GitHub includes the ltx-trainer package with full LoRA training documentation. This is the reference implementation — it requires Python environment setup, GPU hardware, and familiarity with PyTorch training workflows. Suitable for research teams and ML engineers; not a no-code solution.
What to Look for in an LTX-2 No-Code Trainer
For video creators, indie filmmakers, and game studios who want to create custom LTX-2 LoRAs without managing infrastructure, the ideal online trainer has several characteristics:
Guided dataset preparation. Training quality depends heavily on dataset quality — consistent framing, appropriate video length, caption quality. A no-code trainer should provide guidance on dataset preparation, ideally with automated video captioning (using a model like Florence-2 or similar) rather than requiring manual captioning of every clip.
Use-case-specific recipes. LTX-2 LoRA training behaves differently depending on the target: training a motion style LoRA requires different configuration than training a character identity LoRA or a visual style LoRA. Preset recipes that pre-configure training parameters for each use case significantly reduce the expertise required to get good results.
In-platform testing. A LoRA that trains successfully doesn't necessarily produce good inference results. Testing should be available directly in the training platform — input a prompt, generate a test video with the trained LoRA loaded, evaluate the result — without requiring a separate inference API setup.
Credit-based pricing with clear estimates. LTX-2 training is more resource-intensive than LTXV 2.3 training due to the audio-video joint model size. Training costs should be transparent before job submission, with a credit estimate based on dataset size and training configuration.
Grix LoRA Trainer: LTX Video 2.3 and LTX-2 Roadmap
Grix is building a consumer-facing LoRA trainer at grixai.com/lora targeting the no-code use case for both LTX Video 2.3 and LTX-2. The trainer is designed around use-case recipes (Character, Style, Motion, Product, Face, World), automated video captioning, and an integrated Studio for testing trained LoRAs immediately after training — the same platform that handles generation.
LTX Video 2.3 LoRA training is the launch target, with LTX-2 audio-video training on the near-term roadmap as the model matures and fal.ai endpoint support stabilizes. If you're building for a workflow that requires LTX-2 audio-video LoRAs specifically, sign up at grixai.com/try to be notified when LTX-2 training support ships.
For existing LTX Video 2.3 LoRA training needs — without audio-video requirements — see the LTX Video LoRA trainer guide and the LTX-2 LoRA trainer overview for current capability and pricing comparison.
Training Data Requirements for LTX-2 LoRAs
LTX-2 LoRA training data requirements follow similar patterns to LTXV 2.3 with additional considerations for audio:
Video clips: 10-50 training clips is the typical range for character and style LoRAs. More clips are needed for motion-specific training where the motion pattern must be learned across diverse source contexts. Clips should be consistent in the feature being trained: consistent character identity, consistent motion type, consistent stylistic treatment.
Audio considerations: For audio-video LoRAs, audio quality in training data directly affects output audio quality. Consistent audio character across training clips — consistent vocal tone for a character voice LoRA, consistent sonic texture for an environmental audio style — produces better trained LoRA behavior than mixed-quality source audio.
Video length: 3-10 second clips are typical for motion and style training. Longer clips increase training cost without proportional quality gain for most use cases. Clips that are too short (under 2 seconds) may not provide sufficient temporal context for motion learning.
Captions: LLM-generated captions describing each clip's content, style, and motion improve training quality significantly over training without captions. Automated captioning tools (Florence-2, GPT-4 Vision) can generate training captions at scale; this is a key differentiator for no-code trainer platforms that handle captioning automatically.
Current State of the LTX-2 Ecosystem
As of April 2026, the LTX-2 online training ecosystem is early-stage. WaveSpeedAI and fal.ai offer the most developed hosted options; the broader consumer-facing no-code market has not yet converged on a clear leading platform as it has for image LoRA training (where Replicate, fal.ai, and Civitai Training dominate). The LTX-2 model itself continues active development — additional capabilities and improved training tooling from Lightricks are expected as the model matures.
For teams needing LTX-2 LoRA training now: WaveSpeedAI and fal.ai are the most functional hosted options. For teams willing to wait for a fully guided no-code experience: the Grix LoRA trainer roadmap includes LTX-2 support alongside the current LTXV 2.3 launch offering. See grixai.com/pricing for credit pack pricing.
Frequently Asked Questions
What is the difference between LTX-2 and LTX Video 2.3 for LoRA training?
LTX Video 2.3 is Lightricks' video-only model — text and image to video, no audio. LTX-2 is their audio-video joint model, producing video with synchronized audio from text prompts. LoRA training for LTX-2 can influence both visual and audio generation simultaneously. For video-only LoRA training, LTXV 2.3 has a more mature hosted ecosystem; for audio-video LoRAs, LTX-2 is the relevant model.
Can I train an LTX-2 LoRA without a GPU?
Yes, via cloud-hosted training services like WaveSpeedAI and fal.ai. You upload a dataset of training videos, configure parameters, and the platform handles training on their GPU infrastructure. You receive a .safetensors LoRA file as output. No local GPU required.
How long does LTX-2 LoRA training take online?
Training time varies by dataset size, LoRA rank, and training steps. Typical runs on WaveSpeedAI and fal.ai take 20-60 minutes for small datasets (10-20 clips) at standard training configurations. Larger datasets or higher step counts extend this proportionally.
Will Grix support LTX-2 LoRA training?
LTX-2 training is on the Grix LoRA trainer roadmap. LTX Video 2.3 is the launch target. Sign up at grixai.com/try for updates as LTX-2 support ships.
What's the minimum dataset size for a useful LTX-2 LoRA?
10-15 well-prepared clips can produce a useful LoRA for style and motion tasks. Character identity LoRAs benefit from 20-30 clips showing the character in varied contexts and angles. More data generally produces more robust generalization, with diminishing returns beyond 50-100 clips for most use cases.