Financial News Analysis (Using LLM)

Financial News Classifier and Sentiment Analysis (LLM & FinBert)

TSD

Project Structure

Text Cleaning includes following steps (but not limited to) :

Lower Casing
Removal of punctuations
Removal of stopwords
Removal of frequent words
Removal of very rare words
Stemming
Lemmatization
Removal of emojis
removal of emoticons
Conversion of emoticons to words
Conversion of emojis to words
Use of regular expressions (removal of URL, HTML tags , Phone no, Email id, etc)
Chat words conversion
Spelling correction
Removal of non-english words

Example Notebook

Code Snippets

For Topic Modeling we can use

LDA
Guided (Seeded) LDA
Anchored CorEx

Example Notebook

Text Summarization

Text Summarization with Seq2Seq Model
Text summarization with transformers
text summarization with Hugging Face transformers

Example Notebook

Streamlit

Example Notebook

Text Classification

Example Notebook

Sentiment Analysis

Dataset:

Extra: Hugging Face

Official Libraries

First-party cool stuff made with love by Hugging Face.

transformers – State-of-the-art natural language processing for Jax, PyTorch and TensorFlow.
datasets – The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use, and efficient data manipulation tools.
tokenizers – Fast state-of-the-Art tokenizers optimized for research and production.
knockknock – Get notified when your training ends with only two additional lines of code.
accelerate – A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision.
autonlp – Train state-of-the-art natural language processing models and deploy them in a scalable environment automatically.
nn_pruning – Prune a model while finetuning or training.
huggingface_hub – Client library to download and publish models and other files on the huggingface.co hub.
tune – A benchmark for comparing Transformer-based models.

Tutorials

Learn how to use Hugging Face toolkits, step-by-step.

Official Course (from Hugging Face) – The official course series provided by Hugging Face.
transformers-tutorials (by @nielsrogge) – Tutorials for applying multiple models on real-world datasets.

NLP Toolkits

NLP toolkits built upon Transformers. Swiss Army!

AllenNLP (from AI2) – An open-source NLP research library.
Graph4NLP – Enabling easy use of Graph Neural Networks for NLP.
Lightning Transformers – Transformers with PyTorch Lightning interface.
Adapter Transformers – Extension to the Transformers library, integrating adapters into state-of-the-art language models.
Obsei – A low-code AI workflow automation tool and performs various NLP tasks in the workflow pipeline.
Trapper (from OBSS) – State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Text Representation

Converting a sentence to a vector.

Sentence Transformers (from UKPLab) – Widely used encoders computing dense vector representations for sentences, paragraphs, and images.
WhiteningBERT (from Microsoft) – An easy unsupervised sentence embedding approach with whitening.
SimCSE (from Princeton) – State-of-the-art sentence embedding with contrastive learning.
DensePhrases (from Princeton) – Learning dense representations of phrases at scale.

Inference Engines

Highly optimized inference engines implementing Transformers-compatible APIs.

TurboTransformers (from Tencent) – An inference engine for transformers with fast C++ API.
FasterTransformer (from Nvidia) – A script and recipe to run the highly optimized transformer-based encoder and decoder component on NVIDIA GPUs.
lightseq (from ByteDance) – A high performance inference library for sequence processing and generation implemented in CUDA.
FastSeq (from Microsoft) – Efficient implementation of popular sequence models (e.g., Bart, ProphetNet) for text generation, summarization, translation tasks etc.

Model Scalability

Parallelization models across multiple GPUs.

Parallelformers (from TUNiB) – A library for model parallel deployment.
OSLO (from TUNiB) – A library that supports various features to help you train large-scale models.
Deepspeed (from Microsoft) – Deepspeed-ZeRO – scales any model size with zero to no changes to the model. Integrated with HF Trainer.
fairscale (from Facebook) – Implements ZeRO protocol as well. Integrated with HF Trainer.
ColossalAI (from Hpcaitech) – A Unified Deep Learning System for Large-Scale Parallel Training (1D, 2D, 2.5D, 3D and sequence parallelism, and ZeRO protocol).

Model Compression/Acceleration

Compressing or accelerate models for improved inference speed.

torchdistill – PyTorch-based modular, configuration-driven framework for knowledge distillation.
TextBrewer (from HFL) – State-of-the-art distillation methods to compress language models.
BERT-of-Theseus (from Microsoft) – Compressing BERT by progressively replacing the components of the original BERT.

Adversarial Attack

Conducting adversarial attack to test model robustness.

TextAttack (from UVa) – A Python framework for adversarial attacks, data augmentation, and model training in NLP.
TextFlint (from Fudan) – A unified multilingual robustness evaluation toolkit for NLP.
OpenAttack (from THU) – An open-source textual adversarial attack toolkit.

Style Transfer

Transfer the style of text! Now you know why it’s called transformer?

Styleformer – A neural language style transfer framework to transfer text smoothly between styles.
ConSERT – A contrastive framework for self-supervised sentence representation transfer.

Sentiment Analysis

Analyzing the sentiment and emotions of human beings.

conv-emotion – Implementation of different architectures for emotion recognition in conversations.

Grammatical Error Correction

You made a typo! Let me correct it.

Gramformer – A framework for detecting, highlighting, and correcting grammatical errors on natural language text.

Translation

Translating between different languages.

dl-translate – A deep learning-based translation library based on HF Transformers.
EasyNMT (from UKPLab) – Easy-to-use, state-of-the-art translation library and Docker images based on HF Transformers.

Knowledge and Entity

Learning knowledge, mining entities, connecting the world.

PURE (from Princeton) – Entity and relation extraction from text.

Speech

Speech processing powered by HF libraries. Need for speech!

s3prl – A self-supervised speech pre-training and representation learning toolkit.
speechbrain – A PyTorch-based speech toolkit.

Multi-modality

Understanding the world from different modalities.

ViLT (from Kakao) – A vision-and-language transformer Without convolution or region supervision.

Reinforcement Learning

Combining RL magic with NLP!

trl – Fine-tune transformers using Proximal Policy Optimization (PPO) to align with human preferences.

Question Answering

Searching for answers? Transformers to the rescue!

Haystack (from deepset) – End-to-end framework for developing and deploying question-answering systems in the wild.

Recommender Systems

I think this is just right for you!

Transformers4Rec (from Nvidia) – A flexible and efficient library powered by Transformers for sequential and session-based recommendations.

Evaluation

Evaluating NLP outputs powered by HF datasets!

Jury (from OBSS) – Easy to use tool for evaluating NLP model outputs, spesifically for NLG (Natural Language Generation), offering various automated text-to-text metrics.

Neural Search

Search, but with the power of neural networks!

Jina Integration – Jina integration of Hugging Face Accelerated API.
Weaviate Integration (text2vec) (QA) – Weaviate integration of Hugging Face Transformers.
ColBERT (from Stanford) – A fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.

Cloud

Cloud makes your life easy!

Amazon SageMaker – Making it easier than ever to train Hugging Face Transformer models in Amazon SageMaker.

Hardware

The infrastructure enabling the magic to happen.

Qualcomm – Collaboration on enabling Transformers in Snapdragon.
Intel – Collaboration with Intel for configuration options.

NOTE: This list of resources are entirely from Hugging Faces Repository, I am just sharing it here as a Wiki. Please visit the repository for more