Data Engineer, Graph Team

New York, United States Full-time

Applecart deploys proprietary technology to run smarter advertising campaigns. We work with some of the nation’s most prominent corporations, non-profit organizations and political candidates to activate and communicate with key target audiences at a scale and level of efficacy previously thought impossible. Part high-level strategic consultancy, part cutting edge data R+D lab, Applecart offers proven solutions derived from objective, iterative experimentation. Our core offering is a proprietary social graph that leverages publicly-available data to map real-world relationships between individuals at national scale. Our roots are in politics, where we have tested and honed our methods at every level to give our clients a proven technological edge. We’re branching out beyond political campaigns to tackle new advertising challenges in which determining “who knows whom” provides decisive advantages for our clients.

Applecart’s political work has been featured by The Colbert Report, CNN, The Washington Post, The Associated Press, USA Today, The Huffington Post, among other prominent news outlets.

As a Data Engineer on our engineering team, you will be responsible for sourcing, enhancing and integrating data sources to our social graph while providing optimizations to said graph for creation of powerful machine learning applications. Your work will directly affect our clients in the form of election outcomes, increasing political and non-profit fundraising yields and optimizing advertising spends and risk assessments.

Responsibilities:

  • Collaboratively architect, build and launch new social graph components that enhance profiles, increase coverage and edge accuracy.
  • Create, maintain, and scale data pipelines between and for data ingesters, the social graph, machine learning predictors, client deliverables, and data warehousing.
  • Interact cross-functionally with a wide variety of people and teams. Work closely with client services and data engineers to identify opportunities and assess improvements to Applecart products and deliverables.
  • Implement systems for monitoring of streaming and batch data processing (e.g. DataDog, Nagios). Track data quality and consistency.
  • Evangelize solid coding practices (e.g. test driven development, code reviews, continuous deployment, automated linting, staging environments).
  • Contribute to the architectural designs and decision making around data stores, schemas, data security and cloud storage.
  • Rapidly prototype proof-of-concept data pipelines for social graph ROI determination.
  • Keep abreast of industry trends, best practices, and emerging methodologies.
  • Supporting quality assurance as a part of the engineering process and collaborating with product managers such as producing sampled outputs, proofreading pull requests, supporting KPIs, outlining PR limitations and future improvements.

Basic Qualifications:

  • BS or MS degree in Computer Science, Math, Statistics or other technical field.
  • 2-3+ years of applied software engineering experience (especially startups, Python).
  • Python Expertise: classes & inheritance, map & filter functions, list comprehension, generators, decorators, style guides, pylint, pytest, pdb
  • SQL/Hive Expertise: where clauses, joins, group bys, windowing functions, exploding  
  • Spark Expertise: SparkSQL, Caching, Checkpointing, Dataframes, RDDs
  • Expertise in building and maintaining reliable ETL jobs.
  • Ability to write well-abstracted, reusable, object-oriented code components.
  • Enjoy working in a fast-paced environment, highly collaborative and ambitious startup work environment.
  • Understanding of summary statistics and basic mathematical modeling.
  • Experience working in teams, packaging and deploying code in a production setting.
  • Experience with Amazon Web Services (RDS, S3, EC2, EMR, Data Pipeline).

Preferred Qualifications:

  • Experience with open source search platforms such as Solr, Elastic Search or alike.
  • Background in data wrangling various structured, unstructured data sets, consuming APIs (e.g. rate limiting and exponential back-offs) and alike.
  • Knowledge of graph storage and computation frameworks (e.g. GraphX, TitanDB, Neo4J).
  • Familiarity with Scala and/or Java, Apache Spark internals and job optimization.
  • Engagements in a variety of coding projects, examples including but not limited to browser extensions, full stack development, web scraping & mechanical turk automation.
  • Significant interest or background in politics, advertising technology and/or behavior modeling is a big plus.

Apply for this opening at http://applecart.recruiterbox.com/jobs/fk0fuqw?apply=true