Disclaimer! I am NOT an AI Influencer! This is literally just me learning about how to implement a hugging face model for the first time.
And Boy Was it Hard! :))
What is this even about?
I built a document Question & answering bot for this demonstration. It takes an Image & we can query & ask questions regarding that Image.
I had no idea how to implement language models going into this. So it was a really fun experience. Now this is part of a bigger project. Today I'm just sharing one part of it.
Going Crazy
This was for a hackathon project where I was trying to use @streamlit & HuggingFace , I never even had a hugging face account & had only basic tutorial level experience on @streamlit . But I really wanted to learn & implement something on my own. I was tired of following tutorials & it didn't matter if it was a standard solution or not!
But like the heading says, I did go crazy a couple times :)
The beginning of Insanity
Now I'm a web developer who had only heard about hugging face, didn't really care about the hype. But then I decided to experiment a little with this tech. I did not have the necessary setup on my local device , so firstly I had to install PyTorch
& TesseractOCR
on my local PC first. I will not be sharing this trauma :)
Those who know...know 馃槴
But I will take you through how I implemented it!
First we need the basic ingredients! A transformer!
How does it work? - I have no f**king Idea!
What does it do? - Makes language model go "brrrr"
And a library to read Image files.
Basic Imports:
Now to initialize our pipeline.
What does that even mean? Basically we are selecting a language model from huggingface model catalogue. And also setting what type of model that is. Here comes the main point of this blog. Small Language Models. At first I did try to use a popular Large Language Model (Mistral) , but here's the thing, after we've initialized a pipeline, when I run the program for the first time, it needs to download the model onto my local device. But like I've stated before, I have shitty internet & the Mistral-8B was like 2GB+. Every time I would start the project, the model would download halfway & give up.
So I opted to choose a Small Language Model called impira/layoutlm-document-qa
This is a 500 MB model that was a good enough job of answering questions from an uploaded document. But there's some parameters, like the uploaded document needs to be an image, hence the need for a separate library to read image files (PIL).
Pipeline Initialization:
But here's the hard lesson I learned. I used streamlit cloud to deploy the project. So it was running on a cloud gpu. But after a few minutes of usage it would become WAY TOO resource intensive & the project would shut down due to streamlit cloud's resource limitations. I couldn't figure this out for SOO long. But then streamlit came to the rescue again. Using st.cache
we can cache out data so that they become less resource intensive. But I was using st.cache_data
at first, now this API only caches Images or other Data, NOT AI MODELS. Because AI Language Models are classified as resources. So later I had to switch to st.cache_resource
which finally solved the problem!
Caching Pipeline Initialization:
The next steps were pretty easy once I solved the main problem.
On the next stage I used an if-statement
to check if image file is uploaded & loaded the pipeline so the SLM could read the file.
Verify Image Upload:
After that I initialized a form using st.form
to submit a question
Initialize form:
If a question is submitted , the image will be opened & the AI Model will query the image.
Open Query:
Now the query performed doesn't just produce one answer. Remember, this is a language model, it doesn't understand human language, but can only predict the next word based on training data. So in this case it will produce multiple answer.
Finally it will choose the most probable answer and show it to the user!
Get the best answer:
Here's the main file: https://github.com/ShatilKhan/Hemo/blob/main/hemo.py
There's a lot of other features as it is part of a larger project, I just explained the part where I used a Language Model is all.
Hopefully I'll write more about other features of this project soon!
Happy Coding!
Top comments (0)