Loading...
Loading...
Use when "HuggingFace Transformers", "pre-trained models", "pipeline API", or asking about "text generation", "text classification", "question answering", "NER", "fine-tuning transformers", "AutoModel", "Trainer API"
npx skill4agent add eyadsibai/ltk transformers| Task | Pipeline Name | Output |
|---|---|---|
| Text Generation | | Completed text |
| Classification | | Label + confidence |
| Question Answering | | Answer span |
| Summarization | | Shorter text |
| Translation | | Translated text |
| NER | | Entity spans + types |
| Fill Mask | | Predicted tokens |
| Task | Pipeline Name | Output |
|---|---|---|
| Image Classification | | Label + confidence |
| Object Detection | | Bounding boxes |
| Image Segmentation | | Pixel masks |
| Task | Pipeline Name | Output |
|---|---|---|
| Speech Recognition | | Transcribed text |
| Audio Classification | | Label + confidence |
| Class | Use Case |
|---|---|
| AutoModel | Base model (embeddings) |
| AutoModelForCausalLM | Text generation (GPT-style) |
| AutoModelForSeq2SeqLM | Encoder-decoder (T5, BART) |
| AutoModelForSequenceClassification | Classification head |
| AutoModelForTokenClassification | NER, POS tagging |
| AutoModelForQuestionAnswering | Extractive QA |
| Parameter | Effect | Typical Values |
|---|---|---|
| max_new_tokens | Output length | 50-500 |
| temperature | Randomness (0=deterministic) | 0.1-1.0 |
| top_p | Nucleus sampling threshold | 0.9-0.95 |
| top_k | Limit vocabulary per step | 50 |
| num_beams | Beam search (disable sampling) | 4-8 |
| repetition_penalty | Discourage repetition | 1.1-1.3 |
| Option | When to Use |
|---|---|
| device_map="auto" | Let library decide GPU allocation |
| device_map="cuda:0" | Specific GPU |
| device_map="cpu" | CPU only |
| Method | Memory Reduction | Quality Impact |
|---|---|---|
| 8-bit | ~50% | Minimal |
| 4-bit | ~75% | Small for most tasks |
| GPTQ | ~75% | Requires calibration |
| AWQ | ~75% | Activation-aware |
torch_dtype="auto"| Argument | Purpose | Typical Value |
|---|---|---|
| num_train_epochs | Training passes | 3-5 |
| per_device_train_batch_size | Samples per GPU | 8-32 |
| learning_rate | Step size | 2e-5 for fine-tuning |
| weight_decay | Regularization | 0.01 |
| warmup_ratio | LR warmup | 0.1 |
| evaluation_strategy | When to eval | "epoch" or "steps" |
| Strategy | Memory | Quality | Use Case |
|---|---|---|---|
| Full fine-tuning | High | Best | Small models, enough data |
| LoRA | Low | Good | Large models, limited GPU |
| QLoRA | Very Low | Good | 7B+ models on consumer GPU |
| Prefix tuning | Low | Moderate | When you can't modify weights |
| Parameter | Purpose |
|---|---|
| padding | Make sequences same length |
| truncation | Cut sequences to max_length |
| max_length | Maximum tokens (model-specific) |
| return_tensors | Output format ("pt", "tf", "np") |
| Practice | Why |
|---|---|
| Use pipelines for inference | Handles preprocessing automatically |
| Use device_map="auto" | Optimal GPU memory distribution |
| Batch inputs | Better throughput |
| Use quantization for large models | Run 7B+ on consumer GPUs |
| Match tokenizer to model | Vocabularies differ between models |
| Use Trainer for fine-tuning | Built-in best practices |