Senior Data Scientist at Fractal | Gen AI | LLM
5 Years of Experience
Chennai, Tamil Nadu, India
-
-
Not Available
Data Scientist with five years of experience, specializing in delivering optimal solutions using NLP, OpenCV, and LLM. I prioritize gaining a conceptual and intuitive understanding of concepts before their application, I believe this sharpens my problem-solving instincts. This journey has allowed me to evolve in sync with the ever-evolving NLP landscape and the exciting advancements in Large Language models, which I closely follow with keen interest and fascination. I am deeply engaged in the development of products intended to substantially decrease the time span between drug development and market availability. These products are designed to automate, assist, and streamline the intricate process of coding SDTM, ADaM, tables, listings, and figures within the clinical trial report submission process. This endeavor requires a profound understanding of clinical trials, SAS coding, and the intricate statistical aspects that underpin them, which made me proficient in SAS and R in addition to Python. I have developed solutions using ML and deep learning, that efficiently automate the dynamic processes involved in converting raw clinical trial datasets into FDA-compliant statistical reports. Currently, I'm in the process of exploring strategies to harness the potential of open-source large language models like llama2 and falcon to enhance the automation process while safeguarding the privacy of confidential data and optimizing costs, considering the higher expense of GPUs. I am actively applying and exploring techniques like instruction fine-tuning using Qlora, Lora, and the adapter technique, prompt tuning, the implementation of Retrieval Augment Generation, quantization techniques, and employing techniques to convert GPU models to run on CPU in gguf format. Skills: Python, Machine learning, NLP, Open CV, huggingface Transformers, pytorch, keras, Tensorflow, Deep learning, Large Language Models, Langchain, Llamaindex, Cython, R, SAS, Statistics, Jinja2, Pandas, Dask, Matplotlib, LILT, LayoutLM, Autoencoder.
Fractal, SaaS/Cloud Product, Computer Software
Symbiance
Ratilan Technologies
Fractal, Symbiance, Ratilan Technologies, Accenture
Job Title : Senior Data Scientist
Company name : Fractal
Period : November 2023 - Present
Summary : Gen AI specialist
Job Title : Data Scientist
Company name : Symbiance
Period : January 2020 - November 2023
Summary : ZYLIQ AI (Application to Automate Clinical Study Report Generation using AI)
>> Narrative text generation : Finetuned mistral instruct model using lora technique, to generate narrative for the reported adverse event using the subjects data collected in clinical trial. Json data -> narration describing the event.
>> Prompt tuning : Created a prompt and output grammar to get the desired output from LLM.
>> Model Deployment AWS Lambda (CPU) : Deployed 5 bit quantized mistral instruct Q5_K_M model (GGUF) in lambda using llama cpp and langchain and effectively reduced the cost/token for bulk processing.
Simplestats AI (Application to generate Table and graph automatically)
>> Model Finetuning : Finetuned openlm-research/open_llama_7b model on A10G GPU using the instruction data curated locally to generate python query from the available variables and their unique values.
>> Token Classification(NER): Identified Questions,answers,title and other categories in documents using LILT(deduped version of LayoutLMv2 model) from huggingface library. Curated training data using CVAT and merged the results with coordinates from pdfplumber library to create training data.
>> Multiclass Text classification: classified the sentence to appropriate class(dynamic class) using ensembling of embedding vectors(dot product) ,random forest and fuzzy logic.
>> Generated customized high resolution figures in clinical trials data using R and GGplotly.
ALPS AI ( Tool to create dataset from the available SDTM standards)
>> Implemented autoencoder architecture using keras to identify and separate the text blocks in clinical report forms(pdf)
>> Hierarchical text classification: implemented classifier chain algorithm to identify domain and then identify variable within that domain(hierarchical).
Job Title : Data Scientist
Company name : Ratilan Technologies
Period : April 2019 - December 2019
Summary : Part of Symbiance
Location : Chennai Area, India
Job Title : Associate Software Engineer
Company name : Accenture
Period : June 2015 - August 2016
Show More