AI Rx- Gen AI in Healthcare Project 1 - Mining Health Data Repository with GenAI

 Ganesh Venkataramanan

09:52
correct, even my question is the same like the data from kaggle cant we get the data directly from htese data sources?
Vivek Singhal
09:53
This program (https://cellstrathub.com/course/gen-ai?cardId=2) has two versions :- 1) Free Live classes open to all via our meetup page https://www.meetup.com/disrupt-4-0/events/ 2) For class recordings and projects, please enroll formally at https://buy.stripe.com/aEU5lT3eX0Ic26k7sv
Ganesh Venkataramanan
09:56
Can i say that, we are creating synthetic data from various data sources and making the llm to respond based on our prompts? is that we are trying to achieve
Milon Mahapatra
09:57
I think those are actual data which is being collected through scrapping
xdfvtt
10:01
Can you please explain the project architecture, probably in slide 3?
Ganesh Venkataramanan
10:01
right, i think this process of bringing the data fro various sources is called as data synthesis, where the data stays somewhere else but we group it together - do the embed and move it to the vector db for our queries. pls comment your thoughts
Vijayakumar KJ
10:01
how to bring different data sources to your LLM model database?
Vivek Singhal
10:03
Please feel free to ask questions by speaking as well
Vijayakumar KJ
10:04
what is the base LLM model used in healthcare?
how to speed up the LLM model training?
Surya Putchala
10:04
Med Palm2
google "medical LLMs"
Vivek Singhal
10:06
Welcome to CellBot - our Health Intelligence product www.cellbot.ai
Ganesh Venkataramanan
10:07
so grouping the data from various sources and grouping it into our llm - is that we are trying to do here?
Vijayakumar KJ
10:07
how to speed up the customized LLM model ?
Surya Putchala
10:08
Vivek, please check the github link I shared. I will ping you on LinkedIn soon!
Amrita Singh
10:08
share here plz
Vivek Singhal
10:08
Sure Surya
Surya Putchala
10:08
Amrita Singh
10:09
Thanks :-)
xdfvtt
10:10
Thanks Surya for sharing Amlan S.
Surya Putchala
10:10
This is super exciting field! Dolcy is doing a wonderful job.
Vivek Singhal
10:10
Great to hear Surya !
xdfvtt
10:11
Thanks Vivek and team for hosting such a cool webinar. Also kudos to Dolcy
Vivek Singhal
10:11
Try our Health Intelligence product CellBot https://imagineview.com/cellbot-landing?title=CELLBOT
Thanks Amlan !
suresh kumar
10:12
LLM is only used for Healhcare data ?
Vivek Singhal
10:12
LLM is generalized AI model for text generation and analytics (e.g. ChatGPT)
UNNATI GULATY
10:12
so we need to modify search and compare the outcomes . how to parametrize it
Ganesh Venkataramanan
10:13
if the llm is generic one that we are enriching with the external data, we can use any llm model right....like chatgpt 4 etc
Vivek Singhal
10:13
yes you can do RAG or prompt engg on all LLMs
Ganesh Venkataramanan
10:13
i mean, we dont need to go for a health care specific llm here....
fine thanks Vivek
Vivek Singhal
10:14
MedPALM and BioBERT are there also
suresh kumar
10:14
ok. thanks
Vivek Singhal
10:14
in our product https://imagineview.com/cellbot-landing?title=CELLBOT we are RAG mining pubmed and Arxiv
with GPT4
Mithlesh
10:15
Can we consider this chunk as target vector?
Sasi kiran
10:15
what are some of the strategies for chunking
Surya Putchala
10:16
Vivek, the link asks for login id/pwd
Prakash shanbhag
10:17
When we give additional context fetched from RAG to LLM, does it impact the latency ? Is there any way to not pass the context in each query but instead train LLM one time and then use it without context ? Is this what is called fine tuning ?
Vivek Singhal
10:17
yes Surya - you have create Login
Surya Putchala
10:17
okay. thank you!
Vivek Singhal
10:18
This program (https://cellstrathub.com/course/gen-ai?cardId=2) has two versions :- 1) Free Live classes open to all via our meetup page https://www.meetup.com/disrupt-4-0/events/ 2) For class recordings and projects, please enroll formally at https://buy.stripe.com/aEU5lT3eX0Ic26k7sv
Surya Putchala
10:18
Let's not get into technical details for the time being please? Let's see the healthcare domain and its use cases please?
Vivek Singhal
10:19
Good point - we are balancing the two domains
Deepak Kumar
10:19
This is the error is coming in cellbot chat - "request failed with status code 500"
Vivek Singhal
10:19
it is tech + healthcare to bridge the gap
Vivek Singhal
10:20
we need to get SSL on www.cellbot.ai that is why it is error for some. So better to try https://imagineview.com/cellbot-landing?title=CELLBOT
Deepak Kumar
10:21
okay and thanku so much.
xdfvtt
10:22
We can use LLM as judge to evaluate the approach, e.g. GPT4
Vivek Singhal
10:22
Welcome all - I am vivek at CellStrat AI Lab, If you wish to enquire about our courses (https://cellstrathub.com/course) or product internships / collaboration for www.imagineview.com or https://imagineview.com/cellbot-landing?title=CELLBOT, please reach out to me at 9742800566
Vivek Singhal
10:27
On our product CellBot we have said this "Note: This chatbot provides general health and pharma information. This information has not been validated by any regulatory agency or authorized experts. For specific medical or pharma advice, please contact your authorized healthcare provider."
Vivek Singhal
10:29
Disclaimer - Our product CellBot and this Project series is not authorized medical content. This is for researchers only and not approved by any Regulatory body
Ashta Sat
10:29
One of the largest Alternative medicine AI - downloadable - https://brighteon.ai/Home/
Satyajeet Azad
10:30
So human-in-loop or experts of domain required to validate the response generated from LLM?
Vivek Singhal
10:30
Human in Loop and model re-training can increase accuracy
xdfvtt
10:32
And in turn it would increase the acceptability / credibility of the solution
Ganesh Venkataramanan
10:35
In your architecture, you have shown that user based on the initial prompt you will scrap from the sites - is that my understanding correct?
Surya Putchala
10:37
Check this paper....
Things are going fast!
Indrajit CS
10:38
That's right Ganesh
Surya Putchala
10:39
There are a lot of point solutions for Radiology. Niramai is one such. Check it out
Anirved Pandey
10:39
does it mean that only organizations with astronomical budget are capable of getting and training llms(from scratch) on data such as MRIs and heartbeat waves etc.?
Ganesh Venkataramanan
10:39
Thanks Indrajit, so you will make the vector db as full update everytime...meaning you will truncate and update it isndt?
it increases the latency and performance then
Myst Social
10:39
no why full update ?
Praveen Kumar
10:41
During this course, will you as well touch base on usage of multi LLM models?
Indrajit CS
10:41
We uodate only the latest papers
not fron scracth
Vivek Singhal
10:42
Ganesh Venkataramanan
10:42
Tahnks Indrajit, so you keep the data incremental then on the vector db
is that right
Vivek Singhal
10:42
Module 7: Vision in Healthcare Implementing advanced vision technologies to improve diagnostics and treatment in healthcare. UNDERSTANDING IMAGING DATA EXPLORE XRAY-GPT LLM INFERENCING FOR CHEST X-RAY ANALYSIS
Indrajit CS
10:42
yes
Ganesh Venkataramanan
10:43
okay so you check the metadata source of the doc or something and then you will take the update part there
whats the vector db type you use here?
Myst Social
10:44
Ganesh - As per my understanding when you upload any file or new data chunks into vector db we have a option to append the data into same indexes or we can create different indexes
Vaibhav Nakrani
10:44
There are lost of techniques nowadays to fine-tune theses models with parameter efficiency. Meaning the cost is not substantial. Helpful to develop a POC.
*lots
Myst Social
10:45
chances of hallucination increases if we keep a lot od data into same index
and yes because vector db is index based so doc_id maintains indexes for fast searching , concept would be siilar like elastice search
Myst Social
10:46
elastic*
Vivek Singhal
10:49
Dolcy is trained medical professional from Christian Medical College Vellore and also AI professional with Post Grad in AI ML. So able to straddle healthcare and AI both. In healthcare AI - both health and AI skills are critical
Surya Putchala
10:50
Awesome Skills....both horizontal and vertical skills...a rarity!
Vivek Singhal
10:51
yes Surya !
Ganesh Doosa
10:51
is this code available on GitHub?
Vivek Singhal
10:52
this code is part of our Health AI course and provided to enrolled students only
Surya Putchala
10:52
Can you please share the schedule of the classes?
Ganesh Venkataramanan
10:53
enrolled for the gen ai sessions persons on sundays Vivek. or is it need a different enrollment
Vivek Singhal
10:53
TOC here https://cellstrathub.com/course/gen-ai?cardId=2. Classes rendered alternate Saturdays starting today. Live classes are free and available on https://www.meetup.com/disrupt-4-0/events/
Ganesh this is different and new course
Gen AI course and Health AI courses are different
Surya Putchala
10:54
One calss every fortnight, it looks like
Vivek Singhal
10:55
yes Surya
Ganesh Venkataramanan
10:56
ok vivek got it
Sasi kiran
10:56
some health reports can have graphs will the chunking handle multi modals?
Ganesh Venkataramanan
10:56
thanks
Ashta Sat
10:56
Is it biased to only mainstream medicine or does it include other sources like alternative medicine, functional medicine etc. Is there a filter in-built for such things and how to overcome that?
Myst Social
10:57
I don't think anything is readily available , models needs to be trained to get the kind of output we need
The idea here is how to train and how to strategize the architecture to get accurACY AND NO HALLUCINATION
Vivek Singhal
10:59
This class topic is what we have implemented in our CellBot product https://imagineview.com/cellbot-landing?title=CELLBOT. You can create login and try it out
Myst Social
11:00
a llm model is an external brain , now the idea is how it is being trained it will give response , if one person train it it works different if 100 people will train it it will respond like mess :-D
Sorry punk intended :-)
Vijayakumar KJ
11:00
Thank you Dolcy and Vivek for the session!
It was great!
Prasad K
11:01
The model is develped by Cellstrat isnt it. How often do you update?
Milon Mahapatra
11:01
how to connect vision model with llm ?
Mohamed Ashraf
11:01
what is the over all plan/agenda of this course
Indrajit CS
11:01
Yes Prasad
We are keep on developing
Vijayakumar KJ
11:01
I need to drop off...
Deepak Kumar
11:01
can you please show first two slide
Indrajit CS
11:02
Milon: We can use GP4-O
It is multimodal model
Vivek Singhal
11:03
Mohamed - check TOC at https://cellstrathub.com/course/gen-ai?cardId=2. these classes will be rendered alternate Saturdays
Vivek Singhal
11:06
Health AI Course TOC here https://cellstrathub.com/course/gen-ai?cardId=2. Classes rendered alternate Saturdays starting today. Live classes are free and available on https://www.meetup.com/disrupt-4-0/events/
Deepak Kumar
11:06
can you please share that pdf with us.
Ganesh Venkataramanan
11:06
what tis the difference between the easy scrapping and the earlier one that you shown
Anirved Pandey
11:07
can we use threading so that we can achieve parallel processing of scraping? anyone can answer this.
Karthik B.S
11:07
How much time does it take to scrape? And is there any way to run these code in server to speed up the scraping?
Indrajit CS
11:08
Anirved: We are already doing that
Use the live tool
You would see how fast it is
xdfvtt
11:08
What kind of inference infra currently you are using?
Indrajit CS
11:08
AWS
xdfvtt
11:09
Awesome
Anirved Pandey
11:09
@Indrajit CS which tool is being used?
Ganesh Venkataramanan
11:09
I believe these will run on the serverless paas like aws lambda or azure functions @karthik.
Indrajit CS
11:09
We are suing Lambda
Anirved Pandey
11:09
okay
Indrajit CS
11:09
Right Ganesh
Vivek Singhal
11:09
AWS Stack
Welcome all - I am vivek at CellStrat AI Lab, If you wish to enquire about our courses (https://cellstrathub.com/course) or product internships / collaboration for www.imagineview.com or https://imagineview.com/cellbot-landing?title=CELLBOT, please reach out to me at 9742800566
Ganesh Venkataramanan
11:10
but the waiting time or the performance is something that is going to be slow...
i mean the end user need to wait for the data to be updated in the vector db
Indrajit CS
11:10
Ganesh Venkataramanan
11:10
okay will try indrajit, thanks
Indrajit CS
11:11
sure, most welcome
xdfvtt
11:12
Is it possible to open source the notebooks/codes?
Myst Social
11:12
Ganesh , while scraping a layer can be created while all the scrapped data can be put into a file or in some datasouce before vectorising it , TThis will help to cache the scrapped data and re-usability rather than scrapping everytime
just a though
Vivek Singhal
11:12
These codes are related our course so not able to provide unless you enroll
for other webinars not related to course we do provide code on demand
Ganesh Venkataramanan
11:13
right, you are doing it async model here then @mystsocial. but reading the pdf, creating the chunks and creating vecdb sould be taking sometime...
Myst Social
11:14
milli-seconds only
Surya Putchala
11:15
Diagnosis is not simple...
Myst Social
11:15
see scraping data and vectoring is one way and scraping saving the data and then vectorising are 2 approaches
Surya Putchala
11:15
It has to first go through differential diagnosis. Google deepmind is currently building Knowledge Graphs of Symptoms, conditions etc.,
Vivek Singhal
11:16
In CellBot we are working on Diagnostic, Research and Drug Discovery workflows - these are long term complex programs and use state of art Gen AI
Myst Social
11:16
it depends on your need what you want to use I just explained anothwer way out
Surya Putchala
11:16
However, MedPalm2 can ace USMLE
Vivek Singhal
11:16
We are also building Knowledge Graphs for healthcare and also cover in this Health AI course in a later class
Surya Putchala
11:17
cool
xdfvtt
11:17
Great Vivek
Vivek Singhal
11:18
Regardless of course enrollment we invite expert Gen AI developers and Healthcare AI folks to collaborate with us on moonshot Gen AI for healthcare stack - please contact me at 9742800566
Surya Putchala
11:19
Vivek, I will get in touch with you
Vivek Singhal
11:19
Sounds good Surya !
Surya Putchala
11:19
I am very passionate about HealthCare AI
Vivek Singhal
11:19
I can see that :)
xdfvtt
11:19
Vivek I will touchbase with you
Vivek Singhal
11:20
Sure Amlan
Sasi kiran
11:23
are you dealing with only text? Any graphs or tables from papers?
Myst Social
11:23
it's multi-modal so I think everything will be taken care
xdfvtt
11:24
that is correct
Vivek Singhal
11:24
in this project and CellBot so far text only. Images and graphs on product roadmap
Sasi kiran
11:24
Tx Vivek
Surya Putchala
11:25
Can you please restate the problem Kadam Java ji?
Surya Putchala
11:27
There are some simple solutions Dr. kadam Jave that you should try before getting too much into the software development.
Surya Putchala
11:30
yes, what is the question or problem here?
Surya Putchala
11:34
okay, attribution and source?
Diwakar Sinha
11:34
and if you have a pmid you can google it
Surya Putchala
11:34
Is the interpretability, explainability the issue?
Vivek Singhal
11:35
Kindly lets move the class forward - running over time
Surya Putchala
11:35
let's move on pls
These specific Qs, they can reach out to you later. every minute spent is 48 man-minutes please
Vivek Singhal
11:36
Vivek Singhal
11:38
upcoming classes for this course - presented alternate Saturdays https://cellstrathub.com/course/gen-ai?cardId=2
xdfvtt
11:44
Agentic workflow is another animal
Myst Social
11:45
it's not animal it's an architecture :-D
Indrajit CS
11:46
:)
xdfvtt
11:46
:)
Indrajit CS
11:46
We are going bring that animal soon here
xdfvtt
11:47
cool man
Sasi kiran
11:49
How do you run the scrapper automatically as new data from pubmed comes and automatically into the weaviate?
Myst Social
11:52
as new data from pubmed comes --- what does this mena ?
mean?
Vivek Singhal
11:52
it pulls new papers
based on topic searched.
our product https://imagineview.com/cellbot-landing?title=CELLBOT pulls 50 or 100 most relevant papers including latest papers if relevant
Myst Social
11:53
If I am not wrong he is asking how scrapper automatically pulls the new data added into pybmed
is it the ques ?
Sasi kiran
11:54
cool as the user seraches it pulls relevant info
Ashta Sat
11:54
geofencing
Vivek Singhal
11:54
if you are health or pharma professional, we welcome you to join our Healthcare AI Special Interest Group - do send me whatsapp at 9742800566 to be added to Health AI SIG
Surya Putchala
11:57
Vivek, Please add me to the SIG
Surya Putchala
11:58
Sourcing is a challenge!
What if we have to get data from NHS
NHS UK is different than NHS Ireland, NHS canada....it is a mess!
Vivek Singhal
11:59
If NHS has API or bulk data access, it would be on our roadmap
Surya Putchala
11:59
Goign the Beautifulsoup will literally kill you!
As, Dorcy is pointing out, APIs is a good way, but, it kind of limits what we can do.
Ganesh Venkataramanan
12:00
but api should be enabled from the site, like wikepedia has enabled it....not all sites will do
beautifulsoup is the only option we have, do we have any options other than this....we generally use the class or the paragraph id to extract the info
Vivek Singhal
12:01
yes we are limited by data access APIs
Ganesh Venkataramanan
12:01
i usually do that
there is also sometime that we get response status as 999, which means that the data is not accessible
we should also remember this
Vivek Singhal
12:02
Entrez is official PubMed miner by PubMed team
Surya Putchala
12:02
The world currently is not open. The information is siloed. This is one of the biggest challenges in getting Medical data.
Vivek Singhal
12:02
right Surya
Surya Putchala
12:03
Not just this, the data is locked in with in the healthcare institutions....which is helpful for achieving superior patient outcomes.
We can only know these things and hope someone will fix it. Till then, we will be helpless!
Surya Putchala
12:04
There is a problem with the sources as well.
Different studies gives different answers.
Which one is golden truth??
Vivek Singhal
12:05
I think health community is coming around to AI and tech interceptions
Surya Putchala
12:06
This is NOT a LLM problem. This is a serious domain problem. We are at least fortunate in the US that the EHRs are standardized.
But the medical information....is hazardous to make diagnostic decisions
Vivek Singhal
12:07
Our CellBot is focussed on global markets particularly USA. We are at start of this long journey
Surya Putchala
12:07
yes, huge scope
Vivek Singhal
12:07
I have been rounds of Bay Area last 1 year and put up booths in events there
yes agree Surya - scope is massive
Indrajit CS
12:08
Agree Surya, Lot of hospitals have this patient diagnostic data which we won't have access to.
Surya Putchala
12:08
Neither they do anything useful with it nor they let us do anything with it :-(
Vivek Singhal
12:08
Indrajit CS
12:08
This can enable doctors to resources that are very useful and reliable.
yes , that's the sad part

Vivek Singhal
12:08
Indrajit CS
12:08
This can enable doctors to resources that are very useful and reliable.
yes , that's the sad part
Surya Putchala
12:09
...and every tom, dick and harry hospital in India claim "research"....but can't unlock their data
Anirved Pandey
12:10
is there any sort of guard railing applied in here?
Surya Putchala
12:10
No guardrails as of now, to the best of my knowledge
Anirved Pandey
12:10
so anyone can ask anything regardless of the data mined?
Surya Putchala
12:12
Those who are interested in this field, check this out : https://sites.research.google/med-palm/
Anirved Pandey
12:12
Also I'm trying cellbot but it is not giving any response. I asked what is encephalitis but it is still loading.

Ganesh Venkataramanan
12:13
wonderful session....thanks for all your patience and response
Vivek Singhal
12:13
can you send me whatsapp scresnshot
Anirved
pricing page
Ganesh Venkataramanan
12:13
its altogether a new lesson

Comments

Popular posts from this blog

Cloud Computing in simple

How to Write an Effective Design Document

Bookmark