nlptagger
When you need a program to understand context of commands.









General info
*This project is used for tagging cli commands. It is not a LLM or trying to be. I am using it to generate go code but I made this completely separate so others can enjoy it.
*I will keep working on it and hopefully improving the phrase tagging and hopefully adding neural networks in the future.
-Background
-
Tokenization: This is the very first step in most NLP pipelines. It involves breaking down text into individual units called tokens (words, punctuation marks, etc.). Tokenization is fundamental because it creates the building blocks for further analysis.
-
Part-of-Speech (POS) Tagging: POS tagging assigns grammatical categories (noun, verb, adjective, etc.) to each token. It's a crucial step for understanding sentence structure and is often used as input for more complex tasks like phrase tagging.
-
Named Entity Recognition (NER): NER identifies and classifies named entities (people, organizations, locations, dates, etc.) in text. This is more specific than POS tagging but still more generic than phrase tagging, as it focuses on individual entities rather than complete phrases.
-
Dependency Parsing: Dependency parsing analyzes the grammatical relationships between words in a sentence, creating a tree-like structure that shows how words depend on each other. It provides a deeper understanding of sentence structure than phrase tagging, which focuses on contiguous chunks.
-
Lemmatization and Stemming: These techniques reduce words to their base or root forms (e.g., "running" to "run"). They help to normalize text and improve the accuracy of other NLP tasks.
*Phrase tagging often uses the output of these more generic techniques as input. For example:
POS tags are commonly used to define rules for identifying phrases (e.g., a noun phrase might be defined as a sequence of words starting with a determiner followed by one or more adjectives and a noun).
NER can be used to identify specific types of phrases (e.g., a phrase tagged as "PERSON" might indicate a person's name).
Why build this?
- Go never changes
- It is nice to not have terminal drop downs
What does it do?
- It tags words for commands.
*I made an overview video on this project.
video
Technologies
*Just Go.
Requirements
How to run as is?
package main
import (
"fmt"
"strings"
modeldata "github.com/golangast/nlptagger/nn"
"github.com/golangast/nlptagger/tagger/tag"
)
func main() {
//you have to create a trainig file
md, err := modeldata.ModelData("data/training_data.json")
if err != nil {
fmt.Println("Error loading or training model:", err)
}
// Example prediction
sentence := "generate a webserver with the handler dog with the data structure people"
//making prediction
predictedPosTags, predictedNerTags, predictedPhraseTags, predictedDRTags := md.PredictTags(sentence)
//getting tags
predictedTagStruct := tag.Tag{
PosTag: predictedPosTags, // Assign the predicted POS tags to the PosTag field
NerTag: predictedNerTags,
PhraseTag: predictedPhraseTags,
DepRelationsTag: predictedDRTags,
}
// Print the sentence again for clarity
fmt.Println("Sentence:", sentence)
// Print the predicted POS tags in a space-separated format
fmt.Println("Predicted POS Tag Types:", strings.Join(predictedTagStruct.PosTag, " "))
fmt.Println("Predicted NER Tag Types:", strings.Join(predictedTagStruct.NerTag, " "))
fmt.Println("Predicted Phrase Tag Types:", strings.Join(predictedTagStruct.PhraseTag, " "))
fmt.Println("Predicted Dependency Relation Tag Types:", strings.Join(predictedTagStruct.DepRelationsTag, " "))
}
*- clone it
git clone https://github.com/golangast/nlptagger
-
-
- install gonew to pull down project quickly
go install golang.org/x/tools/cmd/gonew@latest
gonew github.com/golangast/nlptagger example.com/nlptagger
-
- cd into nlptagger
=======
cd nlptagger
go run main.go
Repository overview
├── data #training data
│ └── training_data.json
├── nn #neural network
│ ├── modeldata.go
│ ├── nnu #neural network utils
│ └── simplenn #simple neural network
├── tagger #tagger folder
│ ├── dependencyrelation #dependency relation
│ ├── nertagger #ner tagging
│ ├── phrasetagger #phraase tagging
│ ├── postagger #pos tagging
│ ├── stem #stemming tokens before tagging
│ ├── tag #tag data structure
│ └── tagger.go
└── trained_model.gob #model
Overview of the code.
*All this does is tag sentences and it is not a LLM but only is an attempt at tagging commands.
## Things to remember
* it is not a LLM or trying to be
* it is only for cli commands
Just added
*the project
Special thanks
Why Go?