github.com/instill-ai/component@v0.16.0-beta/pkg/connector/huggingface/v0/README.mdx (about)

     1  ---
     2  title: "Hugging Face"
     3  lang: "en-US"
     4  draft: false
     5  description: "Learn about how to set up a VDP Hugging Face connector https://github.com/instill-ai/instill-core"
     6  ---
     7  
     8  The Hugging Face component is an AI connector that allows users to connect the AI models served on the Hugging Face Platform.
     9  It can carry out the following tasks:
    10  
    11  - [Text Generation](#text-generation)
    12  - [Fill Mask](#fill-mask)
    13  - [Summarization](#summarization)
    14  - [Text Classification](#text-classification)
    15  - [Token Classification](#token-classification)
    16  - [Translation](#translation)
    17  - [Zero Shot Classification](#zero-shot-classification)
    18  - [Question Answering](#question-answering)
    19  - [Table Question Answering](#table-question-answering)
    20  - [Sentence Similarity](#sentence-similarity)
    21  - [Conversational](#conversational)
    22  - [Image Classification](#image-classification)
    23  - [Image Segmentation](#image-segmentation)
    24  - [Object Detection](#object-detection)
    25  - [Image To Text](#image-to-text)
    26  - [Speech Recognition](#speech-recognition)
    27  - [Audio Classification](#audio-classification)
    28  
    29  ## Release Stage
    30  
    31  `Alpha`
    32  
    33  ## Configuration
    34  
    35  The component configuration is defined and maintained [here](https://github.com/instill-ai/component/blob/main/pkg/connector/huggingface/v0/config/definition.json).
    36  
    37  ## Connection
    38  
    39  | Field | Field ID | Type | Note |
    40  | :--- | :--- | :--- | :--- |
    41  | API Key (required) | `api_key` | string | Fill your Hugging face API token. To find your token, visit https://huggingface.co/settings/tokens. |
    42  | Base URL (required) | `base_url` | string | Hostname for the endpoint. To use Inference API set to https://api-inference.huggingface.co, for Inference Endpoint set to your custom endpoint. |
    43  | Is Custom Endpoint (required) | `is_custom_endpoint` | boolean | Fill true if you are using a custom Inference Endpoint and not the Inference API. |
    44  
    45  ## Supported Tasks
    46  
    47  ### Text Generation
    48  
    49  Generating text is the task of producing new text. These models can, for example, fill in incomplete text or paraphrase.
    50  
    51  | Input | ID | Type | Description |
    52  | :--- | :--- | :--- | :--- |
    53  | Task ID (required) | `task` | string | `TASK_TEXT_GENERATION` |
    54  | Model (required) | `model` | string | The Hugging Face model to be used |
    55  | String Input (required) | `inputs` | string | String input |
    56  | Parameters | `parameters` | object | Parameters |
    57  | Options | `options` | object | Options for the model |
    58  
    59  | Output | ID | Type | Description |
    60  | :--- | :--- | :--- | :--- |
    61  | Generated Text | `generated_text` | string | The continuated string |
    62  
    63  ### Fill Mask
    64  
    65  Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks.
    66  
    67  | Input | ID | Type | Description |
    68  | :--- | :--- | :--- | :--- |
    69  | Task ID (required) | `task` | string | `TASK_FILL_MASK` |
    70  | Model (required) | `model` | string | The Hugging Face model to be used |
    71  | String Input (required) | `inputs` | string | a string to be filled from, must contain the [MASK] token (check model card for exact name of the mask) |
    72  | Options | `options` | object | Options for the model |
    73  
    74  | Output | ID | Type | Description |
    75  | :--- | :--- | :--- | :--- |
    76  | Results | `results` | array[object] | Results |
    77  
    78  ### Summarization
    79  
    80  Summarization is the task of producing a shorter version of a document while preserving its important information.
    81  
    82  | Input | ID | Type | Description |
    83  | :--- | :--- | :--- | :--- |
    84  | Task ID (required) | `task` | string | `TASK_SUMMARIZATION` |
    85  | Model (required) | `model` | string | The Hugging Face model to be used |
    86  | String Input (required) | `inputs` | string | String input |
    87  | Parameters | `parameters` | object | Parameters |
    88  | Options | `options` | object | Options for the model |
    89  
    90  | Output | ID | Type | Description |
    91  | :--- | :--- | :--- | :--- |
    92  | Summary Text | `summary_text` | string | The string after summarization |
    93  
    94  ### Text Classification
    95  
    96  Text Classification is the task of assigning a label or class to a given text.
    97  
    98  | Input | ID | Type | Description |
    99  | :--- | :--- | :--- | :--- |
   100  | Task ID (required) | `task` | string | `TASK_TEXT_CLASSIFICATION` |
   101  | Model (required) | `model` | string | The Hugging Face model to be used |
   102  | String Input (required) | `inputs` | string | String input |
   103  | Options | `options` | object | Options for the model |
   104  
   105  | Output | ID | Type | Description |
   106  | :--- | :--- | :--- | :--- |
   107  | Results | `results` | array[object] | Results |
   108  
   109  ### Token Classification
   110  
   111  Token classification is a natural language understanding task in which a label is assigned to some tokens in a text.
   112  
   113  | Input | ID | Type | Description |
   114  | :--- | :--- | :--- | :--- |
   115  | Task ID (required) | `task` | string | `TASK_TOKEN_CLASSIFICATION` |
   116  | Model (required) | `model` | string | The Hugging Face model to be used |
   117  | String Input (required) | `inputs` | string | String input |
   118  | Parameters | `parameters` | object | Parameters |
   119  | Options | `options` | object | Options for the model |
   120  
   121  | Output | ID | Type | Description |
   122  | :--- | :--- | :--- | :--- |
   123  | Results | `results` | array[object] | Results |
   124  
   125  ### Translation
   126  
   127  Translation is the task of converting text from one language to another.
   128  
   129  | Input | ID | Type | Description |
   130  | :--- | :--- | :--- | :--- |
   131  | Task ID (required) | `task` | string | `TASK_TRANSLATION` |
   132  | Model (required) | `model` | string | The Hugging Face model to be used |
   133  | String Input (required) | `inputs` | string | String input |
   134  | Options | `options` | object | Options for the model |
   135  
   136  | Output | ID | Type | Description |
   137  | :--- | :--- | :--- | :--- |
   138  | Translation Text | `translation_text` | string | The string after translation |
   139  
   140  ### Zero Shot Classification
   141  
   142  Zero-shot text classification is a task in natural language processing where a model is trained on a set of labeled examples but is then able to classify new examples from previously unseen classes.
   143  
   144  | Input | ID | Type | Description |
   145  | :--- | :--- | :--- | :--- |
   146  | Task ID (required) | `task` | string | `TASK_ZERO_SHOT_CLASSIFICATION` |
   147  | Model (required) | `model` | string | The Hugging Face model to be used |
   148  | String Input (required) | `inputs` | string | String input |
   149  | Parameters | `parameters` | object | Parameters |
   150  | Options | `options` | object | Options for the model |
   151  
   152  | Output | ID | Type | Description |
   153  | :--- | :--- | :--- | :--- |
   154  | Scores | `scores` | array[number] | a list of floats that correspond the the probability of label, in the same order as labels. |
   155  | Labels | `labels` | array[string] | The list of strings for labels that you sent (in order) |
   156  | Sequence (optional) | `sequence` | string | The string sent as an input |
   157  
   158  ### Question Answering
   159  
   160  Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document.
   161  
   162  | Input | ID | Type | Description |
   163  | :--- | :--- | :--- | :--- |
   164  | Task ID (required) | `task` | string | `TASK_QUESTION_ANSWERING` |
   165  | Model (required) | `model` | string | The Hugging Face model to be used |
   166  | Inputs (required) | `inputs` | object | Inputs |
   167  | Options | `options` | object | Options for the model |
   168  
   169  | Output | ID | Type | Description |
   170  | :--- | :--- | :--- | :--- |
   171  | Answer | `answer` | string | A string that’s the answer within the text. |
   172  | Stop (optional) | `stop` | integer | The index (string wise) of the stop of the answer within context. |
   173  | Score (optional) | `score` | number | A float that represents how likely that the answer is correct |
   174  | Start (optional) | `start` | integer | The index (string wise) of the start of the answer within context. |
   175  
   176  ### Table Question Answering
   177  
   178  Table Question Answering (Table QA) is the answering a question about an information on a given table.
   179  
   180  | Input | ID | Type | Description |
   181  | :--- | :--- | :--- | :--- |
   182  | Task ID (required) | `task` | string | `TASK_TABLE_QUESTION_ANSWERING` |
   183  | Model (required) | `model` | string | The Hugging Face model to be used |
   184  | Inputs (required) | `inputs` | object | Inputs |
   185  | Options | `options` | object | Options for the model |
   186  
   187  | Output | ID | Type | Description |
   188  | :--- | :--- | :--- | :--- |
   189  | Aggregator (optional) | `aggregator` | string | The aggregator used to get the answer |
   190  | Answer | `answer` | string | The plaintext answer |
   191  | Cells (optional) | `cells` | array[string] | a list of coordinates of the cells contents |
   192  | Coordinates (optional) | `coordinates` | array[array] | a list of coordinates of the cells referenced in the answer |
   193  
   194  ### Sentence Similarity
   195  
   196  Sentence Similarity is the task of determining how similar two texts are.
   197  
   198  | Input | ID | Type | Description |
   199  | :--- | :--- | :--- | :--- |
   200  | Task ID (required) | `task` | string | `TASK_SENTENCE_SIMILARITY` |
   201  | Model (required) | `model` | string | The Hugging Face model to be used |
   202  | Inputs (required) | `inputs` | object | Inputs |
   203  | Options | `options` | object | Options for the model |
   204  
   205  | Output | ID | Type | Description |
   206  | :--- | :--- | :--- | :--- |
   207  | Scores | `scores` | array[number] | The associated similarity score for each of the given strings |
   208  
   209  ### Conversational
   210  
   211  Conversational response modelling is the task of generating conversational text that is relevant, coherent and knowledgable given a prompt.
   212  
   213  | Input | ID | Type | Description |
   214  | :--- | :--- | :--- | :--- |
   215  | Task ID (required) | `task` | string | `TASK_CONVERSATIONAL` |
   216  | Model (required) | `model` | string | The Hugging Face model to be used |
   217  | Inputs (required) | `inputs` | object | Inputs |
   218  | Parameters | `parameters` | object | Parameters |
   219  | Options | `options` | object | Options for the model |
   220  
   221  | Output | ID | Type | Description |
   222  | :--- | :--- | :--- | :--- |
   223  | Conversation (optional) | `conversation` | object | A facility dictionnary to send back for the next input (with the new user input addition). |
   224  | Generated Text | `generated_text` | string | The answer of the bot |
   225  
   226  ### Image Classification
   227  
   228  Image classification is the task of assigning a label or class to an entire image.
   229  
   230  | Input | ID | Type | Description |
   231  | :--- | :--- | :--- | :--- |
   232  | Task ID (required) | `task` | string | `TASK_IMAGE_CLASSIFICATION` |
   233  | Model (required) | `model` | string | The Hugging Face model to be used |
   234  | Image (required) | `image` | string | The image file |
   235  
   236  | Output | ID | Type | Description |
   237  | :--- | :--- | :--- | :--- |
   238  | Classes | `classes` | array[object] | Classes |
   239  
   240  ### Image Segmentation
   241  
   242  Image Segmentation divides an image into segments where each pixel in the image is mapped to an object.
   243  
   244  | Input | ID | Type | Description |
   245  | :--- | :--- | :--- | :--- |
   246  | Task ID (required) | `task` | string | `TASK_IMAGE_SEGMENTATION` |
   247  | Model (required) | `model` | string | The Hugging Face model to be used |
   248  | Image (required) | `image` | string | The image file |
   249  
   250  | Output | ID | Type | Description |
   251  | :--- | :--- | :--- | :--- |
   252  | Segments | `segments` | array[object] | Segments |
   253  
   254  ### Object Detection
   255  
   256  Object Detection models allow users to identify objects of certain defined classes.
   257  
   258  | Input | ID | Type | Description |
   259  | :--- | :--- | :--- | :--- |
   260  | Task ID (required) | `task` | string | `TASK_OBJECT_DETECTION` |
   261  | Model (required) | `model` | string | The Hugging Face model to be used |
   262  | Image (required) | `image` | string | The image file |
   263  
   264  | Output | ID | Type | Description |
   265  | :--- | :--- | :--- | :--- |
   266  | Objects | `objects` | array[object] | Objects |
   267  
   268  ### Image To Text
   269  
   270  Image to text models output a text from a given image.
   271  
   272  | Input | ID | Type | Description |
   273  | :--- | :--- | :--- | :--- |
   274  | Task ID (required) | `task` | string | `TASK_IMAGE_TO_TEXT` |
   275  | Model (required) | `model` | string | The Hugging Face model to be used |
   276  | Image (required) | `image` | string | The image file |
   277  
   278  | Output | ID | Type | Description |
   279  | :--- | :--- | :--- | :--- |
   280  | Text | `text` | string | Generated text |
   281  
   282  ### Speech Recognition
   283  
   284  Automatic Speech Recognition (ASR), also known as Speech to Text (STT), is the task of transcribing a given audio to text.
   285  
   286  | Input | ID | Type | Description |
   287  | :--- | :--- | :--- | :--- |
   288  | Task ID (required) | `task` | string | `TASK_SPEECH_RECOGNITION` |
   289  | Model (required) | `model` | string | The Hugging Face model to be used |
   290  | Audio (required) | `audio` | string | The audio file |
   291  
   292  | Output | ID | Type | Description |
   293  | :--- | :--- | :--- | :--- |
   294  | Text | `text` | string | The string that was recognized within the audio file. |
   295  
   296  ### Audio Classification
   297  
   298  Audio classification is the task of assigning a label or class to a given audio.
   299  
   300  | Input | ID | Type | Description |
   301  | :--- | :--- | :--- | :--- |
   302  | Task ID (required) | `task` | string | `TASK_AUDIO_CLASSIFICATION` |
   303  | Model (required) | `model` | string | The Hugging Face model to be used |
   304  | Audio (required) | `audio` | string | The audio file |
   305  
   306  | Output | ID | Type | Description |
   307  | :--- | :--- | :--- | :--- |
   308  | Classes | `classes` | array[object] | Classes |