Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

NI start-up raises £590,000 for area’s first stem cell financial institution

March 4, 2026

Coruna iOS Exploit Package Makes use of 23 Exploits Throughout 5 Chains Focusing on iOS 13-17.2.1

March 4, 2026

LangWatch Open Sources the Lacking Analysis Layer for AI Brokers to Allow Finish-to-Finish Tracing, Simulation, and Systematic Testing

March 4, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • NI start-up raises £590,000 for area’s first stem cell financial institution
  • Coruna iOS Exploit Package Makes use of 23 Exploits Throughout 5 Chains Focusing on iOS 13-17.2.1
  • LangWatch Open Sources the Lacking Analysis Layer for AI Brokers to Allow Finish-to-Finish Tracing, Simulation, and Systematic Testing
  • On a regular basis Efficiency Meets Lengthy Battery Life in Apple’s MacBook Neo
  • Google Pixel 10a Canadian Evaluation: Clone telephone
  • Captain Contemporary completes acquisition of Frime
  • Machankura is placing Bitcoin on Africa’s most simple telephones
  • Wearables firm Whoop to create 600 jobs globally
Wednesday, March 4
NextTech NewsNextTech News
Home - AI & Machine Learning - Learn how to Construct Moveable, In-Database Characteristic Engineering Pipelines with Ibis Utilizing Lazy Python APIs and DuckDB Execution
AI & Machine Learning

Learn how to Construct Moveable, In-Database Characteristic Engineering Pipelines with Ibis Utilizing Lazy Python APIs and DuckDB Execution

NextTechBy NextTechJanuary 9, 2026No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Learn how to Construct Moveable, In-Database Characteristic Engineering Pipelines with Ibis Utilizing Lazy Python APIs and DuckDB Execution
Share
Facebook Twitter LinkedIn Pinterest Email


On this tutorial, we reveal how we use Ibis to construct a transportable, in-database characteristic engineering pipeline that appears and appears like Pandas however executes fully contained in the database. We present how we connect with DuckDB, register knowledge safely contained in the backend, and outline complicated transformations utilizing window capabilities and aggregations with out ever pulling uncooked knowledge into native reminiscence. By preserving all transformations lazy and backend-agnostic, we reveal find out how to write analytics code as soon as in Python and depend on Ibis to translate it into environment friendly SQL. Try the FULL CODES right here.

!pip -q set up "ibis-framework[duckdb,examples]" duckdb pyarrow pandas


import ibis
from ibis import _


print("Ibis model:", ibis.__version__)


con = ibis.duckdb.join()
ibis.choices.interactive = True

We set up the required libraries and initialize the Ibis setting. We set up a DuckDB connection and allow interactive execution so that every one subsequent operations stay lazy and backend-driven. Try the FULL CODES right here.

strive:
   base_expr = ibis.examples.penguins.fetch(backend=con)
besides TypeError:
   base_expr = ibis.examples.penguins.fetch()


if "penguins" not in con.list_tables():
   strive:
       con.create_table("penguins", base_expr, overwrite=True)
   besides Exception:
       con.create_table("penguins", base_expr.execute(), overwrite=True)


t = con.desk("penguins")
print(t.schema())

We load the Penguins dataset and explicitly register it contained in the DuckDB catalog to make sure it’s obtainable for SQL execution. We confirm the desk schema and ensure that the information now lives contained in the database fairly than in native reminiscence. Try the FULL CODES right here.

def penguin_feature_pipeline(penguins):
   base = penguins.mutate(
       bill_ratio=_.bill_length_mm / _.bill_depth_mm,
       is_male=(_.intercourse == "male").ifelse(1, 0),
   )


   cleaned = base.filter(
       _.bill_length_mm.notnull()
       & _.bill_depth_mm.notnull()
       & _.body_mass_g.notnull()
       & _.flipper_length_mm.notnull()
       & _.species.notnull()
       & _.island.notnull()
       & _.12 months.notnull()
   )


   w_species = ibis.window(group_by=[cleaned.species])
   w_island_year = ibis.window(
       group_by=[cleaned.island],
       order_by=[cleaned.year],
       previous=2,
       following=0,
   )


   feat = cleaned.mutate(
       species_avg_mass=cleaned.body_mass_g.imply().over(w_species),
       species_std_mass=cleaned.body_mass_g.std().over(w_species),
       mass_z=(
           cleaned.body_mass_g
           - cleaned.body_mass_g.imply().over(w_species)
       ) / cleaned.body_mass_g.std().over(w_species),
       island_mass_rank=cleaned.body_mass_g.rank().over(
           ibis.window(group_by=[cleaned.island])
       ),
       rolling_3yr_island_avg_mass=cleaned.body_mass_g.imply().over(
           w_island_year
       ),
   )


   return feat.group_by(["species", "island", "year"]).agg(
       n=feat.rely(),
       avg_mass=feat.body_mass_g.imply(),
       avg_flipper=feat.flipper_length_mm.imply(),
       avg_bill_ratio=feat.bill_ratio.imply(),
       avg_mass_z=feat.mass_z.imply(),
       avg_rolling_3yr_mass=feat.rolling_3yr_island_avg_mass.imply(),
       pct_male=feat.is_male.imply(),
   ).order_by(["species", "island", "year"])

We outline a reusable characteristic engineering pipeline utilizing pure Ibis expressions. We compute derived options, apply knowledge cleansing, and use window capabilities and grouped aggregations to construct superior, database-native options whereas preserving all the pipeline lazy. Try the FULL CODES right here.

options = penguin_feature_pipeline(t)
print(con.compile(options))


strive:
   df = options.to_pandas()
besides Exception:
   df = options.execute()


show(df.head())

We invoke the characteristic pipeline and compile it into DuckDB SQL to validate that every one transformations are pushed all the way down to the database. We then run the pipeline and return solely the ultimate aggregated outcomes for inspection. Try the FULL CODES right here.

con.create_table("penguin_features", options, overwrite=True)


feat_tbl = con.desk("penguin_features")


strive:
   preview = feat_tbl.restrict(10).to_pandas()
besides Exception:
   preview = feat_tbl.restrict(10).execute()


show(preview)


out_path = "/content material/penguin_features.parquet"
con.raw_sql(f"COPY penguin_features TO '{out_path}' (FORMAT PARQUET);")
print(out_path)

We materialize the engineered options as a desk straight inside DuckDB and question it lazily for verification. We additionally export the outcomes to a Parquet file, demonstrating how we are able to hand off database-computed options to downstream analytics or machine studying workflows.

In conclusion, we constructed, compiled, and executed a sophisticated characteristic engineering workflow totally inside DuckDB utilizing Ibis. We demonstrated find out how to examine the generated SQL, materialized outcomes straight within the database, and exported them for downstream use whereas preserving portability throughout analytical backends. This strategy reinforces the core thought behind Ibis: we maintain computation near the information, decrease pointless knowledge motion, and keep a single, reusable Python codebase that scales from native experimentation to manufacturing databases.


Try the FULL CODES right here. Additionally, be happy to observe us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you’ll be able to be part of us on telegram as nicely.

Try our newest launch of ai2025.dev, a 2025-focused analytics platform that turns mannequin launches, benchmarks, and ecosystem exercise right into a structured dataset you’ll be able to filter, evaluate, and export.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s traits in the present day: learn extra, subscribe to our publication, and grow to be a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

LangWatch Open Sources the Lacking Analysis Layer for AI Brokers to Allow Finish-to-Finish Tracing, Simulation, and Systematic Testing

March 4, 2026

Bodily Intelligence Workforce Unveils MEM for Robots: A Multi-Scale Reminiscence System Giving Gemma 3-4B VLAs 15-Minute Context for Complicated Duties

March 4, 2026

Meet SymTorch: A PyTorch Library that Interprets Deep Studying Fashions into Human-Readable Equations

March 4, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

NI start-up raises £590,000 for area’s first stem cell financial institution

By NextTechMarch 4, 2026

LifeCellsNI additionally plans to offer contingency biobanking providers for healthcare suppliers, universities and personal firms.…

Coruna iOS Exploit Package Makes use of 23 Exploits Throughout 5 Chains Focusing on iOS 13-17.2.1

March 4, 2026

LangWatch Open Sources the Lacking Analysis Layer for AI Brokers to Allow Finish-to-Finish Tracing, Simulation, and Systematic Testing

March 4, 2026
Top Trending

NI start-up raises £590,000 for area’s first stem cell financial institution

By NextTechMarch 4, 2026

LifeCellsNI additionally plans to offer contingency biobanking providers for healthcare suppliers, universities…

Coruna iOS Exploit Package Makes use of 23 Exploits Throughout 5 Chains Focusing on iOS 13-17.2.1

By NextTechMarch 4, 2026

Google mentioned it recognized a “new and highly effective” exploit equipment dubbed…

LangWatch Open Sources the Lacking Analysis Layer for AI Brokers to Allow Finish-to-Finish Tracing, Simulation, and Systematic Testing

By NextTechMarch 4, 2026

As AI growth shifts from easy chat interfaces to advanced, multi-step autonomous…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!