Perfection Kills View RSS

Javascript rants and findings, by kangax
Hide details



My Fitness: from spreadsheet to an app 3 Oct 4:00 PM (23 days ago)

My Fitness: from spreadsheet to an app

My fitness journey—as is the case with many teenagers—began with bodybuilding. I wanted to look good. Soon after, I found StrongLifts 5x5 and got into powerlifting. It became all about numbers: getting bigger bench, bigger squat, bigger press. Yet, I’ve always been drawn to the notion of Total Athleticism, as coined by Max Shank in one of the articles on T-Nation that I used to read religiously around 20151. Having a 300 bench was cool but I didn’t want to be one of those powerlifters who had massive numbers yet couldn’t run up the stairs. I wanted to also be good at running, calisthenics, kettlebells. Eventually this brought me to “functional fitness” and, of course, CrossFit, which popularized it circa 2000.

CrossFit standards

In my fitness circles, CrossFit was still criticized for its reckless high-skill olympic movements performed at high intensity. Blame the epic fail videos of someone doing something stupid and the ignorance around the methodology. While I was on the offense about doing actual CrossFit, I loved the “variable movements” concept. I found a couple of “crossfit athlete standards” posters online and made this spreadsheet to track my progress across multiple domains. It immediately exposed all my gaps: I could squat 2x bodyweight but my snatch was at a measly 100lb and all the speed and work capacity tests were barely at level 2:

If I wanted to be an all-around developed athlete, these were the things I had to work on. The standards also served as an “objective” benchmark. To consider yourself “advanced” here’s how many pull-ups you had to be able to do; and this is how fast your 1 mile run would have to be. It gave me a concrete goal to work towards. These spreadsheets became my north star for the following few years. For a challenge junky like me, they were a perfect long-term obsession.

Strength and Skill

The spreadsheet overload was real. This wasn’t the first one I used. As far back as 2011, I found exrx.net Strength Standards and created this view to understand where I stand strength-wise and what to work on:

In the last couple years I started tracking my frequency and total-lifetime-session-count of certain movements I wanted to be better at — a concept I wrote about before:

Finally, I tracked my proficiency levels on various CrossFit -specific movements as a way to advance my skill and become fluent in them during WOD’s.

One app to rule them all

The year was 2025.
I was a software engineer.
And I would still manually update a spreadsheet with the number of times I’ve performed a certain movement that I deemed as “needed practice“.

This was embarrassing.

When I embarked on building PRzilla, I realized that perhaps this was the time to ditch manual spreadsheet tracking. I could now build an app that would have all of this backed in:

Whoop, Apple Fitness, and the rise of quantified fitness

I use Whoop and I absolutely love how it’s able to distill complex/boring HRV/RHR metrics into simple, quantified scores like recovery and strain. Whoop and Apple Fitness—that’s just as big on quantifiable fitness—were a big motivation for this app.

On the other hand there are apps like BTWB which is one of the most extensive Crossfit-style workout tracking tools, but I found its UI unintuitive and UX clunky:

A snapshot of your fitness

And so I turned all my spreadsheets into a simple snapshot consisting of 5 views: your level, balance, strength, ownership, and practice. These could be easily extended in the future with any other modules: time domain distribution, specific goals tracking like work capacity or endurance. Or even sport-specific ones like Hyrox.

SugarWOD parser

In order to turn all my workout data into beautiful charts, I needed to… have that data in the first place. One issue was that it was split between:

Importing wodwell scores was easy since it was just a map of common wod (fran, murph, etc.) to a score. Strong app export would be a lot more involved since I would have to implement sets and reps tracking (as well as workout sessions, potential rest values, and so on). So I decided to focus on SugarWOD import. And this is where the fun began.

Good news: SugarWOD allows easy export of all of you workout history.
Bad news: SugarWOD data is very… unstructured.

Thanks for reading Juriy’s Substack! Subscribe for free to receive new posts and support my work.

Here’s an example of CSV:

09/14/2022,WOD,24 Minute AMRAP:Row 240m12 Lateral Burpees over Back of Rower48 Double Unders24 Alternating Front Foot Elevated Reverse Lunges (53/35)*,3.073,3+73,Rounds + Reps,,"[{""rnds"":3,""reps"":73}]",,SCALED,

As you can see, we have an arbitrary, potentially non-descriptive wod title like “WOD” plus gobbled up, plain-text description like “24 Minute AMRAP:Row 240m12 Lateral Burpees over Back of Rower48 Double Unders24 Alternating Front Foot Elevated Reverse Lunges (53/35)*” that’s missing basic formatting / newlines.

As humans, we’re able to quickly parse this into:

24 Minute AMRAP:
Row 240m
12 Lateral Burpees over Back of Rower
48 Double Unders
24 Alternating Front Foot Elevated Reverse Lunges (53/35)*

Thank god we live in the age of LLM’s which are capable of reasoning through a messy jammed up text like this just like we—humans—do.

Another peculiarity was load-based entries. In SugarWOD you can program them in a workout and specify sets and reps, e.g. 5 sets of 3 snatches. Users can then log a value for each of the 5 sets. In order to present your performance on the leaderboard, SugarWOD allows coaches to specify how to score those sets — max value (who got the highest weight)? lowest value (who got the fastest row time)? sum of all values (who did the most work overall)? and so on. The export doesn’t expose this scoring criteria so parser needs to infer it based on the sets data. In the example below, we can see that the scoring was using SUM of 12 sets and so 695 is not the weight user did as a 1RM squat snatch; the real squat snatch values are in the sets field:

05/31/2024,WOD,12 ROUNDS:30 Second CAP:3 Toes to Bar2 Lateral Barbell Burpees1 Squat Snatch*REST 1 Minute.*Increase weight as able.,695,695,Load,,"[{""success"":true,""load"":55},{""success"":true,""load"":55},{""success"":true,""load"":55},{""success"":true,""load"":55},{""success"":true,""load"":55},{""success"":true,""load"":55},{""success"":true,""load"":60},{""success"":true,""load"":60},{""success"":true,""load"":60},{""success"":true,""load"":60},{""success"":true,""load"":60},{""success"":true,""load"":65}]",,RX,

In this case it’s “obvious” that 695 wasn’t a 1RM snatch (current world record is 496lbs) but some cases are much less obvious so the parser needs to be very careful there.

LLM-powered pipeline

And so after many weeks of experimenting, refining, rewriting, and adjusting based on real data (thanks to amazing volunteers in my gym), I now have a pretty smart and capable pipeline that turns unstructured SugarWOD data into a structured PRzilla data:

One of the biggest findings—and things that slowed me down—was realizing that a giant, monolithic LLM prompt we used to generate giant JSON with a dozen of different fields (gpp, modality, difficulty, benchmarks, classification, etc.) was taking way too much time, was way too expensive, and often produced errors as it tried to do too many things at once.

Parallelization for the win

I then switched to a series of small, targeted LLM parsers/prompts—as seen on the diagram above—for each of the metrics and ran them in parallel. The results were astonishing: faster and cheaper execution and much more accurate results. This also gave me flexibility to run only specific parsers in specific cases; e.g. when parsing your historic data we want to extract movements (to feed it into our proficiency metrics), performance levels (to understand how your fitness level progressed), time domain (to see a time domain breakdown), and so on. We don’t care about coaching/scaling/stimulus module since the workouts are in the past and users don’t need to know that! However, those modules are important for new workouts, when using analyze or generate.

Finally, it allowed me to run these parsers in parallel which meant that a WOD analysis was now taking time_of_slowest_module (usually ~12-15sec) rather than sequential SUM(module1, module2, …) that would usually take up to 40 sec!

Unstructured to structured

The end result is incredible. We’re able to turn a textual mess like this, into an actual workout and your actual performance. Here we see that 14min AMRAP was properly parsed into movements like “Wall Walk” and “Front Squat”; that it’s an endurance and stamina -heavy workout (yep!), that it’s classified as “Very Hard” and its modality are equally “Gymnastics” and “Weighlifting”. Moreover, AI determined that user’s score of 80 falls right around L3 (which we would likely adjust to L3.5 or L4 due to workout’s difficulty):

Parsing “Wall Walk” as a movement is what allows us to count that towards your practice score! Notice that we now know that you’ve done “Wall Walk” 32 times in your life with the recent one being 6 months ago.

Now that we have this structured data, the possibilities—all of a sudden—are endless. We can easily, and more importantly, automatically show your strength levels: powerlifting, weightlifting, crossfit total:

We can derive how good you are at Endurance, Stamina, Power and other GPP components based on your scores on WODs that are high in those:

Because we’ve determined time domain of all the wods you’ve ever done, we can show if you’re leaning towards shorter or longer ones. Yes, parsing 1300 entries is expensive but at least we can marvel at the end result 😅. Here is coach Mike’s real data dating back to 2018. You can see that early years prioritize short WODs (<12min) whereas last couple years the focus has shifted towards longer, HYROX-style ones:

And, of course, we have ability to see all the WODs for any given movement (why it’s so important to parse those for all the custom WODs and create proper associations). Here you can see that Mike has done over 232 lifetime front squat sessions over 8 years, 157 as dedicated lifts and 75 as part of WODs:

In the interest of brevity, I’ll stop right here. There are other things powering this pipeline which I’m still refining and perhaps can talk about later: male vs. female benchmarks, age-based adjustment of strength and fitness metrics, smart movement aggregation for practice skill screen, logging import errors like movements that don’t match in our DB, or a smart system of retrying LLM when parsers fail.

End goal

Now that I’ve gotten here, I can’t help but wonder: what’s next? and what’s the end goal? I can now replace spreadsheets with this app but it doesn’t solve all of my use cases. The dream would be to have an app that can track all of my workouts. This means:

1

Alex Viada came out with Hybrid Athlete around the same time.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

PRzilla: CrossFit AI companion 9 Sep 4:00 PM (last month)

PRzilla: CrossFit AI companion

Why

When I left LinkedIn, I itched to build something in the space dear to my heart — fitness and CrossFit specifically. I also wanted a challenge of building a full-stack app, something I’ve never done before. The app would solve my pain points but I wanted to release it out there for anyone to use. This meant database, auth, users, and production-level user experience. It would be the biggest project I’ve ever done. With the rise of AI-assisted coding, it was a perfect time.

Problem

I’ve been using SugarWOD to track scores for CrossFit workouts (WODs) prescribed by my gym. But SugarWOD was never designed to be a standalone tracker: it’s missing many WODs, those that are there don’t have any info, and it doesn’t have a way to discover new ones. So I supplemented it with wodwell.com to find more workouts and track their scores.

Wodwell has its own issues: full of ads, a clunky UI, and it's slow. More importantly, I wanted to be in control of my data and wodwell has no export. It also wasn’t great that my workout history and performance data was split between two platforms.

What

All of this sounded like a perfect opportunity to build just that: a full-stack app that has an incredibly easy and fast search through a 1000 most popular WODs. It would allow to log scores for any of them, to track your progression over time, and to favorite WODs for later. As I started coding with AI, I quickly realized I could go even further: we could get insights into WODs via AI analysis (time domain, difficulty, L1-10 benchmarks, etc.)

How

Before I set out on this journey I wanted to define few foundational tenets that were non-negotiable.

AI-driven

“Vibe coding” exploded as I was starting this project. My LinkedIn feed was full with “this is incredible” and “this will never work” posts. I came across Addy’s article on Cline and decided to build this app entirely with AI as a matter of principle. No manual coding. It would be a perfect experiment since an app was not just a trivial one-pager vibe-coded in a day.

Mobile-ready

Always a fun UI challenge and is certainly a must these days unless you provide a native app. In context of CrossFit, you often need to look things up or log your scores while in the gym. Every page needed to be responsive and every UI concept needed to be adopted to small and large screens.

Dark mode

Not a terribly complicated constrain and is largely solved by using the right foundational abstractions but it does add cognitive complexity, especially if you’re working with AI, as you need to ensure it complies and uses the right tokens.

Stateful

Often overlooked aspect but it’s what separates a polished, predictable app from a clunky frustrating experience. URL’s are the source of truth. Important UI state change needs to be reflected in them. Now you have the power to reload it, bookmark it, share it, go back, and so on.

Fast

Next.js is known for SSR support out of the box; this means fast server-driven apps. This was a great opportunity for me to learn and experiment with these concepts.

Big lesson I took from introducing these in the beginning: each constrain is a liability, another dimension to your product surface. Be careful with creating too many from the start. Think iPhone and its lack of copy-paste first few years.

A feature alone is a single point or a line (1D).

  • Add a "mobile-ready" constraint, and that line now exists on a 2D plane (feature x device). You have to test both states.
  • Add "dark mode," and the plane becomes a 3D cube (feature x device x theme).
  • Add "SSR-ready," and you're now in a 4D space.

From Zero to SaaS in 150 Days

I’ve now spent about 5 months working on this daily-ish. I learned a ton about AI assisted coding and wrote about most of it. The lessons never stop and I post them weekly on LinkedIn.

What started as a simple way to see most common WODs, quickly turned into a powerful UI that allows to find just the right workout. With the power of AI, I’ve gone deep on classifying workouts to create helpful data that doesn’t exist anywhere else out there — difficulty, modality, training stimulus, time domain, and workout characteristics via tags.

When I ask AI to summarize the complexity 1 of the app now:

PRZilla is a large-scale production web application with 123,000 lines of TypeScript code across 818 files, featuring 19 database tables, ~109 React components in 304 TSX files, and 67 tRPC API procedures. The codebase includes 1,532 test cases with 256 E2E tests across 40 test files ensuring critical user journeys, 58 service modules handling complex business logic including 6 AI-powered features, and manages 922 predefined workouts with sophisticated scoring algorithms. This represents approximately 2-3 years of full-time development effort , comparable in complexity to a mid-sized SaaS product.

It’s incredible to see the kind of power you wield with AI. The breadth and depth of functionality certainly feels like it would have taken me 2-3 years. I haven’t written any of this code and honestly can’t imagine having to ever write code manually again.

Cutting wood by hand is slow. Using an electric saw freehand is fast, but it’s how you get a crooked cut. The real leverage comes when you bolt the saw in place at a precise angle, set the exact speed, and let it execute a perfect cut in a minute.

That is exactly how I build software now. I don’t write code manually. And I don't just hand a task to an AI. Instead, I architect the system, protect it with guardrails so it stays the course, and give it specific instructions so it knows exactly the path to follow.

My role has changed: I am the architect and the guardrail engineer.

The Hard Part is Still the Hard Part

Having spent a good amount of time not only developing new features but also refactoring, redesigning UI, and fixing bugs, I can tell with good confidence: your app will not fall apart. AI is capable of 95%. The remaining 5% are complex cases that usually reside at the edges of larger system integrations OR are just complex in nature. Those would be also complex for human, likely even more so.

For example, I’ve struggled to implement a well-working lazy loading of WOD cards on the main page because there was already a complex state management of various filters that had to all work in unison and support SSR; introducing lazy loading created X^Y^Z level of state management complexity and AI struggled to keep everything together without small bugs popping up here and there.

These are the fundamentally hard issues inherent to engineering. AI offers no magic wand for challenges like:

AI also can’t make your app stable if the underlying structure is rotten: fragmented state, logic duplication, complex branches with subtle bugs. But it’s surprisingly good at finding those and fixing them in a heartbeat.

Code != Product

When I look at the app right now I feel like it would have taken me much less time to build the “final” version. Yet, the reality is that development works like this in non-trivial apps:

AI allows you to travel that curvy path much faster. Although you have to be careful because without proper guardrails you can start swinging too far left and right: you created too much code, too many experiments, pushed things to prod too fast, all leading to too much liability.

Production-ready

You develop a feature, you have 1 problem.
You decide to release it into production, now you have 10 problems.

Besides the app looking “good” and working “smooth”, the most important production-level aspect is making sure you don’t break things. In the last 10 years I’ve worked at big companies where, despite often being oncall, you always have dedicated SRE help. You also have a well-oiled infra machine to detect errors in prod and notify you.

Thankfully, for small full-stack apps like mine, platforms like PostHog & Sentry are incredible and provide all-in-one solutions for error monitoring (and more) with generous basic tiers.

No broken windows

I followed a pretty standard, tiered approach to release things safely:

Big takeaway here was not to trust AI with E2E tests. I didn’t pay too much attention to all of the assertions at first, then quickly discovered that bugs weren’t being caught. Turns out quality of E2E assertions was subpar: tests relied only on visibility checks, many used vague assertions or hard‑coded values, and almost none validated data against the database. Tests were slow and flaky due to waitForTimeout calls and text-based or CSS-class selectors. I ended up adding lint rules (via eslint-plugin-playwright) to ensure AI doesn’t break this in the future.

Good design

I struggled with design at first but later found that it’s often a matter of the right prompt. For example, this was prompted to look like Apple Fitness / Whoop in 2025 with Sonnet 4 (which translates to clean, modern and minimal UI with oversized elements):

Compare to the old one:

To summarize, I think at least 80% of my time was spent on making things polished: figuring out UI/UX, refining UI/UX… endlessly, testing various permutations of an app, thinking through edge cases, ensuring it’s tested well, ensuring it’s feature-complete yet not over-engineered, documenting it well, deploying it correctly, and so on.

In the next post, I’ll dive deeper into some of the fitness-heavy concepts I’ve implemented in the app. We’ll talk more about that colorful “My Fitness” page and the complex LLM-powered pipeline that powers it!

1

Code metrics don't tell the whole story, but they do provide a rough idea of this app's scale. Recognizing that AI can introduce bloat, I carefully reviewed and streamlined all committed code. I estimate the result is a lean codebase with no more than 15-20% potential cruft.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Using AI to accurately predict CrossFit workout difficulty and performance 30 May 4:00 PM (4 months ago)

Using AI to accurately predict CrossFit workout difficulty and performance

One of the things I’ve geeked out on recently was using AI to assign difficulty and performance bands to a WOD. Not just one but 907 of them (and counting).

I’m building PRzilla.app which allows you to log scores for various WOD’s, track your performance, and see all kinds of cool charts about it: how you’re progressing, what biases and weaknesses there are, movement prioritization.

Performance levels

In order to measure athlete performance, we need to compare it against a set of “objective” levels. For example, it is commonly accepted that Fran can be done within 3 minutes if you’re an elite CrossFit athlete, with the rest of the bands looking like this:

What is a good score for the “Fran” workout?

  • Beginner: 7-9 minutes
  • Intermediate: 6-7 minutes
  • Advanced: 4-6 minutes
  • Elite: <3 minutes

Wodwell is likely the biggest database of WOD’s and it shows bands for some of the most common ones but not all of them. I decided to add these bands to PRzilla and add them to all workouts.

But how do we measure all of them?

Ideally, we’d have a dedicated panel of experts going over thousands of WOD’s to figure all of this out. Thankfully, current top-tier AI models are trained on sufficient volume of CrossFit data and have strong-enough reasoning capabilities to do this in much shorter time.

Subjectivity

Here’s the thing: absolute scores are bound to be subjective and context-dependent!

Even though Fran times are “commonly accepted“ as <3, 4-6, 6-7, 7-9 there can also be a decent variation among them when adjusted for male vs. female, year/decade measured (CF performance is usually trending upwards), in general population vs. experienced CrossFitters, in CrossFitters vs. specialized athletes (runners/weightlifters/calisthenic warriors), based on country/area or a specific gym (Mayhem vs. your typical box), and many more.

This makes calculations tricky but I think it’s still possible to create a range that resembles an averaged-out, close-enough representation. Our “5k run” results are likely more lax than the ones actual runners would use. But for most workouts, it’s possible to use reasoning to tell that 6 minute Fran is roughly a (top of) intermediate or a (bottom of) advanced.

More importantly, as long as our scoring system is consistent across the board, it’s great for measuring relative performance: either against yourself over time, or against others.

CrossFit specific AI analysis

Top tier AI models are already trained on a large enough CrossFit data to be able to determine most of these bands. But we need few extra layers of careful orchestration to create a solid system at large:

  1. SOTA thinking model(s)
    I used mostly Gemini 2.5 Pro (sometimes Sonnet 3.7) since those were the best reasoning models at the moment.

  2. Prompt and context
    Make it act like a CrossFit coach / exercise scientist. “Use your extensive CrossFit knowledge”. Give it extra data to consult and base off of to narrow the scope and context, e.g. Community Cup tiers and measurements.

  3. Memory bank and examples
    In practice, models are limited by 100-200K context window so we can’t send our entire JSON consisting of millions of tokens. For relative stability across batches, we need to constantly orient our model to return relatively similar calculations. I used a combination of memory bank + specific detailed documentation for this plan/feature + few examples of already existing calculations and reasoning across varied workouts. Reasoning was performed in batches to prevent hallucinations and tackle cost (this was expensive as is).

  4. Internal error correction
    At the end of calculation, model needs to double-check its own analysis of specific WOD for correctness.

  5. External error correction
    At the end of each batch of calculations, model takes few random existing scores and compares them to a current calculation; this ensures relative stability across many batches.

Percentiles

CrossFit popularized percentile-based scores during the Open, and — as part of Community Cup — they recently rolled out a scoring system that consists of 5 levels that map directly to percentiles — Rookie (<21%), Novice (22-43%), Intermediate (44-65%), Advanced (66-87%), and Pro (>88%).

I initially went with wodwell-inspired 4 tiers of Beginner, Intermediate, Advanced, Elite, then realized that there’s not enough granularity. When working on GPP charts, the scale was 1-10 and plotting “Advanced” on it was washing out the result too much. 60% and 80% could both be considered Advanced but to go from one to another might take you few years! Similarly with GPP wheel chart: if your stamina is at 60% and strength is at 80%, you would want to see that reflected on a chart as unevenness.

Example: Frelen

Here is a raw example of one of the calculations and model’s reasoning. I already had a 4 tier system/data that was generated using similar heuristic; each WOD had a difficulty, difficultyExplanation (the one model derived from its reasoning before), type, and levels.

I then used a model to derive 10 levels by giving it existing framework of how we derived those 4 levels + new system of 10 levels + examples.

Notice how it analyses WOD step by step; it understands that it’s similar to “Helen” and “Eva” since both follow a similar triplet pattern of run, x, pull-ups with this one being closer to “Eva” in terms of volume; it calculates rough times for run and thrusters while accounting for fatigue and number of rounds; adjusts edges of 10-tier to be more than 4-tier one and even considers that because it’s a “hard” WOD, beginner level is to be extended by 7min.

This now allows us to see where our scores stand for any WOD, such as this L7 that falls within 4:00-5:00 for Diane.

Before doing a workout, you can take a look at the performance guide and have a better idea which time to shoot for to get into a certain percentile.

AI-derived difficulty

Using similar training and reasoning, I was also able to create difficulty levels for all WOD’s. You’ve already seen “Frelen” categorized as “Hard“ earlier.

Here’s the actual documentation used when orienting AI to work with this data. AI uses this as a framework to understand general structure of a workout, and then adjusts difficulty based on modifiers like volume, skill, and load.

Difficulty examples

This produces strikingly accurate results. Note AI’s explanation for why difficulty is set certain way:

Fun fact: “Extremely Hard“ category did not exist until I introduced Crossfit Games workouts at which point AI proactively came up with it and it made sense as the relative difficulty was objectively increased in those! Only 15 out of 907 are currently categorized as such.

Effort vs Complexity

Some of you will certainly scoff at a “1k row” categorized as easy. A simple movement like that can absolutely be made into a grueling test of strength, grit, endurance and stamina. The difficulty in PRzilla is not about how hard something can be made but how demanding it is on skill/strength/endurance. 1k row is easy in a sense that it can be performed by almost any person and can be completed with little effort as prescribed. You can’t say the same about Amanda that will have you do 21 ring muscle-ups together with 21 squat snatches at 135lb — feats that can take you years to master individually, not to mention being able to superset them.

Extreme skills

Speaking of extremely hard tests, it was interesting to see how AI estimates something like “Triple unders: max reps” or “Free standing handstand push-ups: max reps”:

This is a "Very Hard" test of max unbroken triple-unders. This is an extremely high-skill movement. Even a single rep is a significant achievement for many.

High-skill jump rope variation requiring exceptional timing, coordination, and wrist speed.

  • L1: Cannot complete 0 reps (effectively)
  • L2: 0 reps
  • L3: 0 reps
  • L4: 0 reps
  • L5: 1-4 reps
  • L6: 5-10 reps
  • L7: 11-15 reps
  • L8: 16-21 reps
  • L9: 22-36 reps
  • L10: >=36 reps

Because we maintain relative difficulty, even an intermediate score on such tests are a great achievement. And the model understands that beginners (up until level 5) are unlikely to complete even 1.

Timeline and adjusted performance

Once we know your performance levels on all the WOD’s, it’s easy to plot them over time for a chart like this that shows “fitness level” progression and trend. And here’s something even more fun — because we have WOD’s difficulty, we can adjust your score to be more representative of real life performance (meaning that getting “Intermediate” in a “Very Hard” WOD is closer to getting “Advanced“ in “Hard” one):

adjustedLevel = cap(scoreLevel + difficultyBonus, 0, 10)

…where difficultyBonus is something simple like:

Easy: -0.5, Medium: +0.0, Hard: +0.5, Very Hard: +1.0, Extremely Hard: +1.5

Work in progress

Give these estimates a try — do they feel right? Could anything be improved? I’m planning to refine these in PRzilla for an even deeper understanding of workout stimulus; similar to community cup, we could be better at gender and age group adjustments. There are also gaps right now with certain WODs that have a timecap and so are a hybrid of time (if completed within timecap) and reps/load (if completed at timecap).

In the future, I’m planning to add an option to input any custom WOD and get an estimate of its difficulty and performance levels.

1

For Time: 500 meter Row, 40 Air Squats, 30 Sit-Ups, 20 Push-Ups, 10 Pull-Ups

2

50-40-30-20-10 Reps For Time: Double-Unders, Sit-Ups

3

7 Rounds For Time: 15 Kettlebell Swings (1.5/1 pood) , 15 Power Cleans (95/65 lb), 15 Box Jumps (24/20 in))

4

6 Rounds For Time: 60 Double-Unders, 30 Kettlebell Swings (1.5/1 pood), 15 Burpees

5

5 Rounds For Time: 400 meter Run, 30 Box Jumps (24/20 in), 30 Wall Ball Shots (20/14 lb)

6

5 Rounds for Time: 20 Handstand Push-Ups, 40 Pull-Ups, 60 Pistols (Alternating Legs)

7

7 Rounds for Time: 7 Handstand Push-Ups, 7 Thrusters (135/95 lb), 7 Knees-to-Elbows, 7 Deadlifts (245/165 lb), 7 Burpees, 7 Kettlebell Swings (2/1.5 pood), 7 Pull-Ups

8

For Time: 1 mile Run, 100 Handstand Push-Ups, 200 Alternating Pistols, 300 Pull-Ups, 1 mile Run. Wear a Weight Vest (20/14 lb)

9

For Time: 1,500 meter Row Then, 5 Rounds of: 10 Bar Muscle-Ups, 7 Shoulder-to-Overheads (235/145 lb)

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Javascript quiz. ES6 edition. 3 Nov 2015 3:00 PM (9 years ago)

Javascript quiz. ES6 edition.

Remember that crazy Javascript quiz from 6 years ago? Craving to solve another set of mind-bending snippets no sensible developer would ever use in their code? Looking for a new installment of the most ridiculous Javascript interview questions?

Look no further! The "ECMAScript Two Thousand Fifteen" installment of good old Javascript Quiz is finally here.

The rules are as usual:

  • Assuming ECMA-262 6th Edition
  • Implementation quirks do not count (assuming standard behavior only)
  • Every snippet is run as a global code (not in eval, function, or module contexts)
  • There are no other variables declared (and host environment is not extended with anything beyond what's defined in a spec)
  • Answer should correspond to exact return value of entire expression/statement (or last line)
  • "Error" in answer indicates that overall snippet results in a compile or runtime error
  • Cheating with Babel doesn't count (and there could even be bugs!)

The quiz goes over such ES6 topics as: classes, computed properties, spread operator, generators, template strings, and shorthand properties. It's relatively easy, but still tricky. It tries to cover various ES6 features — a little bit of this, a little bit of that — but it's certainly still only a tiny subset.

If you can think of other silly riddle ideas to break one's head against, please post them in the comments. For a slightly harder version, feel free to explore some of the tests in our compat table or perhaps something from TC39 official test suite.

Ready? Here we go.

  1.  

  2.  

  3.  

  4.  

  5.  

  6.  

  7.  

  8.  

  9.  

  10.  

  11.  

  12.  

  13.  

Here be quiz result

I hope you enjoyed it. I'll try to write up an explanation for these in the near future.

Vote on Hacker News Tweet

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

The poor, misunderstood innerText 31 Mar 2015 4:00 PM (10 years ago)

The poor, misunderstood innerText

Few things are as misunderstood and misused on the web as innerText property. That quirky, non-standard way of element's text retrieval, [introduced by Internet Explorer](https://msdn.microsoft.com/en-us/library/ie/ms533899%28v=vs.85%29.aspx) and later "copied" by both WebKit/Blink and Opera for web-compatibility reasons. It's usually seen in combination with textContent — as a cross-browser way of using standard property followed by a proprietary one: Or as the main webcompat offender in [numerous Mozilla tickets](https://bugzilla.mozilla.org/show_bug.cgi?id=264412#c24) — since Mozilla is one of the only major browsers refusing to add this non-standard property — when someone doesn't know what they're doing, skipping textContent "fallback" altogether: innerText is pretty much always frown upon. After all, why would you want to use a non-standard property that does the "same" thing as a standard one? Very few people venture to actually check the differences, and on the surface it certainly appears as there is none. Those curious enough to investigate further usually do find them, but only slight ones, and only when retrieving text, not setting it. Back in 2009, I did just that. And I even wrote [this StackOverflow answer](http://stackoverflow.com/a/1359822/130652) on the exact differences — slight whitespace deviations, things like inclusion of <script> contents by textContent (but not innerText), differences in interface (Node vs. HTMLElement), and so on. All this time I was strongly convinced that there isn't much else to know about textContent vs. innerText. Just steer away from innerText, use this "combo" for cross-browser code, keep in mind slight differences, and you're golden. Little did I know that I was merely looking at the tip of the iceberg and that my perception of innerText will change drastically. What you're about to hear is the story of Internet Explorer getting something right, the real differences between these properties, and how we probably want to standardize this red-headed stepchild.

The real difference

A little while ago, I was helping someone with the implementation of text editor in a browser. This is when I realized just how ridiculously important these seemingly insignificant whitespace deviations between textContent and innerText are. Here's a simple example:

See the Pen gbEWvR by Juriy Zaytsev (@kangax) on CodePen.

Notice how innerText almost precisely represents exactly how text appears on the page. textContent, on the other hand, does something strange — it ignores newlines created by <br> and around styled-as-block elements (<span> in this case). But it preserves spaces as they are defined in the markup. What does it actually do? Looking at the [spec](http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Node3-textContent), we get this:
This attribute returns the text content of this node and its descendants. [...]

On getting, no serialization is performed, the returned string does not contain any markup. No whitespace normalization is performed and the returned string does not contain the white spaces in element content (see the attribute Text.isElementContentWhitespace). [...]

The string returned is made of the text content of this node depending on its type, as defined below:

For ELEMENT_NODE, ATTRIBUTE_NODE, ENTITY_NODE, ENTITY_REFERENCE_NODE, DOCUMENT_FRAGMENT_NODE:

     concatenation of the textContent attribute value of every child node, excluding COMMENT_NODE and PROCESSING_INSTRUCTION_NODE nodes. This is the empty string if the node has no children.

For TEXT_NODE, CDATA_SECTION_NODE, COMMENT_NODE, PROCESSING_INSTRUCTION_NODE

     nodeValue
In other words, textContent returns concatenated text of all text nodes. Which is almost like taking markup (i.e. innerHTML) and stripping it off of the tags. Notice that no whitespace normalization is performed, the text and whitespace are essentially spit out the same way they're defined in the markup. If you have a giant chunk of newlines in HTML source, you'll have them as part of textContent as well. While investigating these issues, I came across a [fantastic blog post by Mike Wilcox](http://clubajax.org/plain-text-vs-innertext-vs-textcontent/) from 2010, and pretty much the only place where someone tries to bring attention to this issue. In it, Mike takes a stab at the same things I'm describing here, saying these true-to-the-bone words:
Internet Explorer implemented innerText in version 4.0, and it’s a useful, if misunderstood feature. [...]

The most common usage for these properties is while working on a rich text editor, when you need to “get the plain text” or for other functional reasons. [...]

Because “no whitespace normalization is performed”, what textContent is essentially doing is acting like a PRE element. The markup is stripped, but otherwise what we get is exactly what was in the HTML document — including tabs, spaces, lack of spaces, and line breaks. It’s getting the source code from the HTML! What good this is, I really don’t know.
Knowing these differences, we can see just how potentially misleading (and dangerous) a typical textContent || innerText retrieval is. It's pretty much like saying:

The case for innerText

Coming back to a text editor... Let's say we have a [contenteditable](http://html5demos.com/contenteditable) area in which a user is writing something. And we'd like to have our own spelling correction of a text in that area. In order to do that, we really want to analyze text the way it appears in the browser, not in the markup. We'd like to know if there are newlines or spaces typed by a user, and not those that are in the markup, so that we can correct text accordingly. This is just one use-case of plain text retrieval. Perhaps you might want to convert written text to another format (PDF, SVG, image via canvas, etc.) in which case it has to look exactly as it was typed. Or maybe you need to know the cursor position in a text (or its entire length), so you need to operate on a text the way it's presented. I'm sure there's more scenarios. A good way to think about innerText is as if the text was selected and copied off the page. In fact, this is exactly what WebKit/Blink does — it [uses the same code](http://lists.w3.org/Archives/Public/public-html/2011Jul/0133.html) for Selection#toString serialization and innerText! Speaking of that — if innerText is essentially the same thing as stringified selection, shouldn't it be possible to emulate it via Selection#toString? It sure is, but as you can imagine, the performance of such thing [leaves more to be desired](http://jsperf.com/innertext-vs-selection-tostring/4) — we need to save current selection, then change selection to contain entire element contents, get string representation, then restore original selection: The problems with this frankenstein of a workaround are performance, complexity, and clarity. It shouldn't be so hard to get "plain text" representation of an element. Especially when there's an already "implemented" property that does just that. Internet Explorer got this right — textContent and Selection#toString are poor contenders in cases like this; innerText is exactly what we need. Except that it's non-standard, and unsupported by one major browser. Thankfully, at least Chrome (Blink) and Safari (WebKit) were considerate enough to immitate it. One would hope there's no deviations among their implementations. Or is there?

Differences with textContent

Once I realized the significance of innerText, I wanted to see the differences among 2 engines. Since there was nothing like this out there, I set on a path to explore it. In true ["cross-browser maddness" traditions](http://unixpapa.com/js/key.html), what I've found was not for the faint of heart. I started with (now extinct) [test suite by Aryeh Gregor](https://web.archive.org/web/20110205234444/http://aryeh.name/spec/innertext/test/innerText.html) and [added few more things](http://kangax.github.io/jstests/innerText/) to it. I also searched WebKit/Blink bug trackers and included [whatever](https://code.google.com/p/chromium/issues/detail?id=96839) [relevant](https://bugs.webkit.org/show_bug.cgi?id=14805) [things](https://bugs.webkit.org/show_bug.cgi?id=17830) I found there. The table above (and in the test suite) shows all the gory details, but few things worth mentioning. First, good news — Internet Explorer <=9 are identical in their behavior :) Now bad — everything else diverges. Even IE changes with each new version — 9, 10, 11, and Tech Preview (the unreleased version of IE that's currently in the making) are all different. What's also interesting is how WebKit copied some of the old-IE traits — such as not including contents of <script> and <style> elements — and then when IE changed, they naturally drifted apart. Currently, some of the WebKit/Blink behavior is like old-IE and some isn't. But even comparing to original versions, WebKit did a poor job copying this feature, or rather, it seems like they've tried to make it better! Unlike IE, WebKit/Blink insert tabs between table cells — that kind of makes sense! They also preserve upper/lower-cased text, which is arguably better. They don't include hidden elements ("display:none", "visibility:hidden"), which makes sense too. And they don't include contents of <select> elements and <canvas>/<video> fallback — perhaps a questionable aspect — but reasonable as well. Ok, there's more good news. Notice that IE Tech Preview (Spartan) is now much closer to WebKit/Blink. There's only 9 aspects they differ in (comparing to 10-11 in earlier versions). That's still a lot but there's at least some hope for convergence. Most notably, IE again stopped including <script> and <style> contents, and — for the first time ever — stopped including "display:none" elements (but not "visibility:hidden" ones — more on that later).

Opera mess

You might have caught the lack of Opera in a table. It's not just because Opera is now using Blink engine (essentially having WebKit behavior). It's also due to the fact that when it wasn't on Blink, it's been reaaaally naughty when it comes to innerText. To sustain web compatibility, Opera simply went ahead and "aliased" innerText to textContent. That's right, in Opera, innerText would return nothing close to what we see in IE or WebKit. There's simply no point including in a table; it would diverge in every single aspect, and we can just consider it as never implemented.

Note on performance

Another difference lurks behind textContent and innerText — performance. You can find dozens of [tests on jsperf.com comparing innerText and textContent](http://jsperf.com/search?q=innerText) — innerText is often dozens time slower. In [this blog post](http://www.kellegous.com/j/2013/02/27/innertext-vs-textcontent/), Kelly Norton is talking about innerText being up to 300x slower (although that seems like a particularly rare case) and advises against using it entirely. Knowing the underlying concepts of both properties, this shouldn't come as a surprise. After all, innerText requires knowledge of layout and [anything that touches layout is expensive](http://gent.ilcore.com/2011/03/how-not-to-trigger-layout-in-webkit.html). So for all intents and purposes, innerText is significantly slower than textContent. And if all you need is to retrieve a text of an element without any kind of style awareness, you should — by all means — use textContent instead. However, this style awareness of innerText is exactly what we need when retrieving text "as presented"; and that comes with a price.

What about jQuery?

You're probably familiar with jQuery's text() method. But how exactly does it work and what does it use — textContent || innerText combo or something else? Turns out, jQuery [takes a safe route](https://github.com/jquery/jquery/blob/7602dc708dc6d9d0ae9982aadb9fa4615a9c49fa/external/sizzle/dist/sizzle.js#L942-L971) — it either returns textContent (if available), or manually does what textContent is supposed to do — iterates over all children and concatenates their nodeValue's. Apparently, at one point jQuery **did** use innerText, but then [ran into good old whitespace differences](http://bugs.jquery.com/ticket/11153) and decided to ditch it altogether. So if we wanted to use jQuery to get real text representation (à la innerText), we can't use jQuery's text() since it's basically a cross-browser textContent. We would need to roll our own solution.

Standardization attempts

Hopefully by now I've convinced you that innerText is pretty damn useful; we went over the underlying concept, browser differences, performance implications and saw how even an all-mighty jQuery is of no help. You would think that by now this property is standardized or at least making its way into the standard. Well, not so fast. Back in 2010, Adam Barth (of Google), [proposes to spec innerText](http://lists.w3.org/Archives/Public/public-whatwg-archive/2010Aug/0455.html) in a WHATWG mailing list. Funny enough, all Adam wants is to set pure text (not markup!) of an element in a secure way. He also doesn't know about textContent, which would certainly be a preferred (standard) way of doing that. Fortunately, Mike Wilcox, whose blog post I mentioned earlier, chimes in with:
In addition to Adam's comments, there is no standard, stable way of *getting* the text from a series of nodes. textContent returns everything, including tabs, white space, and even script content. [...] innerText is one of those things IE got right, just like innerHTML. Let's please consider making that a standard instead of removing it.
In the same thread, Robert O'Callahan (of Mozilla) [doubts usefulness of innerText](http://lists.w3.org/Archives/Public/public-whatwg-archive/2010Aug/0477.html) but also adds:
But if Mike Wilcox or others want to make the case that innerText is actually a useful and needed feature, we should hear it. Or if someone from Webkit or Opera wants to explain why they added it, that would be useful too.
Ian Hixie is open to adding it to a spec if it's needed for web compatibility. While Rob O'Callahan considers this a redundant feature, Maciej Stachowiak (of WebKit/Apple) hits the nail on the head with [this fantastic reply](http://lists.w3.org/Archives/Public/public-whatwg-archive/2010Aug/0480.html):
Is it a genuinely useful feature? Yes, the ability to get plaintext content as rendered is a useful feature and annoying to implement from scratch. To give one very marginal data point, it's used by our regression text framework to output the plaintext version of a page, for tests where layout is irrelevant. A more hypothetical use would be a rich text editor that has a "convert to plaintext" feature. textContent is not as useful for these use cases, since it doesn't handle line breaks and unrendered whitespace properly.
[...]
These factors would tend to weigh against removing it.
To which Rob gives a reasonable reply:
There are lots of ways people might want to do that. For example, "convert to plaintext" features often introduce characters for list bullets (e.g. '*') and item numbers. (E.g., Mac TextEdit does.) Safari 5 doesn't do either. [...] Satisfying more than a small number of potential users with a single attribute may be difficult.
And the conversation dies out.

Is innerText really useful?

As Rob points out, "convert to plaintext" could certainly be an ambiguous task. In fact, we can easily create a test markup that looks nothing like its "plain text" version:

See the Pen emXMKZ by Juriy Zaytsev (@kangax) on CodePen.

Notice that "opacity: 0" elements are not displayed, yet they are part of innerText. Ditto with infamous "text-indent: -999px" hiding technique. The bullets from the list are not accounted for and neither is dynamically generated content (via ::after pseudo selector). Paragraphs only create 1 newline, even though in reality they could have gigantic margins. But I think that's OK. If you think of innerText as text copied from the page, most of these "artifacts" make perfect sense. Just because a chunk of text is given "opacity: 0" doesn't mean that it shouldn't be part of output. It's a purely presentational concern, just like bullets, space between paragraphs or indented text. What matters is **structural preservation** — block-styled elements should create newlines, inline ones should be inline. One iffy aspect is probably "text-transform". Should capitalized or uppercased text be preserved? WebKit/Blink think it should; Internet Explorer doesn't. Is it part of a text itself or merely styling? Another one is "visibility: hidden". Similar to "opacity: 0" (and unlike "display: none"), a text is still part of the flow, it just can't be seen. Common sense would suggest that it should still be part of the output. And while Internet Explorer does just that, WebKit/Blink disagrees (also being curiously inconsistent with their "opacity: 0" behavior). Elements that aren't known to a browser pose an additional problem. For example, WebKit/Blink recently started supporting <template> element. That element is not displayed, and so it is not part of innerText. To Internet Explorer, however, it's nothing but an unknown inline element, and of course it outputs its contents.

Standardization, take 2

In 2011, another innerText proposal [is posted to WHATWG mailing list](http://lists.w3.org/Archives/Public/public-html/2011Jul/0133.html), this time by Aryeh Gregor. Aryeh proposes to either:
  1. Drop innerText entirely
  2. Spec innerText to be like textContent
  3. Actually spec innerText according to whatever IE/WebKit are doing
Similar to previous discussions, Mozilla opposes 3rd option (standardizing it), whereas Microsoft and Opera oppose 1st one (dropping it). In the same thread, Aryeh expresses his concerns about standardizing innerText:
The problem with (3) is that it would be very hard to spec; it would be even harder to spec in a way that all browsers can implement; and any spec would probably have to be quite incompatible anyway with the existing implementations that follow the general approach. [...]
Indeed, as we've seen from the tests, compatibility poses to be a serious issue. If we were to standardize innerText, which of the 2 behaviors should we put in a spec? Another problem is reliance on Selection.toString() (as expressed by Boris Zbarsky):
It's not clear whether the latter is in fact an option; that depends on how Selection.toString gets specified and whether UAs are willing to do the same for innerText as they do for Selection.toString....

So far the only proposal I've seen for Selection.toString is "do what the copy operation does", which is neither well-defined nor acceptable for innerText. In my opinion.
In the end, we're left with [this WHATWG ticket by Aryeh](https://www.w3.org/Bugs/Public/show_bug.cgi?id=13145) on specifying innerText. Things look rather grim, as evidenced in one of the comments:
I've been told in no uncertain terms that it's not practical for non-Gecko browsers to remove. Depending on the rendering tree to the extent WebKit does, on the other hand, is insanely complicated to spec in terms of standard stuff like DOM and CSS. Also, it potentially breaks for detached nodes (WebKit behaves totally differently in that case). [...] But Gecko people seemed pretty unhappy about this kind of complexity and rendering dependence in a DOM property. And on the other hand, I got the impression WebKit is reluctant to rewrite their innerText implementation at all. So I'm figuring that the spec that will be implemented by the most browsers possible is one that's as simple as possible, basically just a compat shim. If multiple implementers actually want to implement something like the innerText spec I started writing, I'd be happy to resume work on it, but that wasn't my impression.
We can't remove it, can't change it, can't spec it to depend on rendering, and spec'ing it would be quite difficult :)

Light at the end of a tunnel?

Could there still be some hope for innerText or will it forever stay an unspecified legacy with 2 different implementations? My hope is that the test suite and compatibility table are the first step in making things better. We need to know exactly how engines differ, and we need to have a good understanding of what to include in a spec. I'm sure this doesn't cover all cases, but it's a start (other aspects worth exploring: shadow DOM, detached nodes). I think this test suite should be enough to write 90%-complete spec of innerText. The biggest issue is deciding which behavior to choose among IE and WebKit/Blink. The plan could be: 1. Write a spec 2. Try to converge IE and WebKit/Blink behavior 3. Implement spec'd behavior in Firefox Seeing [how amazing Microsoft has been](https://status.modern.ie/) recently, I really hope we can make this happen.

The naive spec

I took a stab at a relatively simple version of innerText: Couple important tasks here: 1. Checking if a text node is within "formatted" context (i.e. a child of "white-space: pre-*" node), in which case its contents should be concatenated as is; otherwise collapse all whitespaces to 1. 2. Checking if a node is block-styled ("block", "list-item", "table", etc.), in which case it has to be surrounded by newlines; otherwise, it's inline and its contents are output as is. Then there's things like ignoring <script>, <style>, etc. nodes and inserting tab ("\t") between <td> elements (to follow WebKit/Blink lead). This is still a very minimal and naive implementation. For one, it doesn't collapse newlines between block elements — a quite important aspect. In order to do that, we need to keep track of more state — to know information about previous node's style. It also doesn't normalize whitespace in "true" manner — a text node with leading and trailing spaces, for example, should have those spaces stripped if it is (the only node?) in a block element. This needs more work, but it's a decent start. It would be also a good idea to write innerText implementation in Javascript, with unit tests for each of the "feature" in a compat table. Perhaps even supporting 2 modes — IE and WebKit/Blink. An implementation like this could then be simply integrated into non-supporting engines (or used as a proper polyfill). I'd love to hear your thoughts, ideas, experiences, criticism. I hope (with all of your help) we can make some improvement in this direction. And even if nothing changes, at least some light was shed on this very misunderstood ancient feature.

Update: half a year later

It's been half a year since I wrote this post and few things changed for the better! First of all, [Robert O'Callahan](http://robert.ocallahan.org/) of Mozilla made some awesome effort — he decided to [spec out the innerText](https://github.com/rocallahan/innerText-spec) and then implemented it in Firefox. The idea was to create something simple but sensible. The proposed spec — only after about 11 years — is now [implemented in Firefox 45](https://bugzilla.mozilla.org/show_bug.cgi?id=264412) :) I've added FF45 results to [a compat table](http://kangax.github.io/jstests/innerText/) and aside from couple differences, FF is pretty close to Chrome's implementation. I'm also planning to add more tests to find any other differences among Chrome, FF, and Edge. The spec already revealed few bugs in Chrome, which I'm hoping to file tickets for and see resolved. If we can then also get Edge to converge, we'll be very close to having all 3 biggest browsers behave similarly, making `innerText` viable feature in a near future.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Know thy reference 10 Dec 2014 3:00 PM (10 years ago)

Know thy reference

Abusing leaky abstractions for a better understanding of “this”

It was a sunny Monday morning that I woke up to an article on HackerNews, simply named “This in Javascript”. Curious to see what all the attention is about, I started skimming through. As expected, there were mentions of this in global scope, this in function calls, this in constructor instantiation, and so on. It was a long article. And the more I looked through, the more I realized just how overwhelming this topic might seem to folks unfamiliar with intricacies of this, especially when thrown into a myriad of various examples with seemingly random behavior.

It made me remember a moment from few years ago when I first read Crockford’s Good Parts. In it, Douglas succinctly laid out a piece of information that immediately made everything much clearer in my head:

The `this` parameter is very important in object oriented programming, and its value is determined by the invocation pattern. There are four patterns of invocation in JavaScript: the method invocation pattern, the function invocation pattern, the constructor invocation pattern, and the apply invocation pattern. The patterns differ in how the bonus parameter this is initialized.

Determined by invocation and only 4 cases? Well, that’s certainly pretty simple.

With this thought in mind, I went back to HackerNews, wondering if anyone else thought the subject was presented as something way too complicated. I wasn’t the only one. Lots of folks chimed in with the explanation similar to that from Good Parts, like this one:

Even more simply, I'd just say:
1) The keyword "this" refers to whatever is left of the dot at call-time.
2) If there's nothing to the left of the dot, then "this" is the root scope (e.g. Window).
3) A few functions change the behavior of "this"—bind, call and apply
4) The keyword "new" binds this to the object just created

Great and simple breakdown. But one point caught my attention — “whatever is left of the dot at call-time”. Seems pretty self-explanatory. For foo.bar(), this would refer to foo; for foo.bar.baz(), this would refer to foo.bar, and so on. But what about something like (f = foo.bar)()? After all, it seems that “whatever is left of the dot at call time” is foo.bar. Would that make this refer to foo?

Eager to save the world from unusual results in obscure cases, I rushed to leave a prompt comment on how the concept of “left of the dot” could be hairy. That for best results, one should understand concept of references, and their base values.

It is then that I shockingly realized that this concept of references actually hasn’t been covered all that much! In fact, searching for “javascript reference” yielded anything from cheatsheets to “pass-by-reference vs. pass-by-value” discussions, and not at all what I wanted. It had to be fixed.

And so this brings me here.

I’ll try to explain what these mysterious References are in Javascript (by which, of course, I mean ECMAScript) and how fun it is to learn this behavior through them. Once you understand References, you’ll also notice that reading ECMAScript spec is much easier.

But before we continue, quick disclaimer on the excerpt from Good Parts.

Good Parts 2.0

The book was written in the times when ES3 roamed the prairies, and now we’re in a full state of ES5.

What changed? Not much.

There’s 2 additions, or rather sub-points to the list of 4:

  1. method invocation
  2. function invocation
    • “use strict” mode (new in ES5)
  3. constructor invocation
  4. apply invocation
    • Function.prototype.bind (new in ES5)

Function invocation that happens in strict mode now has its this values set to undefined. Actually, it would be more correct to say that it does NOT have its this “coerced” to global object. That’s what was happening in ES3 and what happens in ES5-non-strict. Strict mode simply avoids that extra step, letting undefined propagate through.

And then there’s good old Function.prototype.bind which is hard to even call an addition. It’s essentially call/apply wrapped in a function, permanently binding this value to whatever was passed to bind(). It’s in the same bracket as call and apply, except for its “static” nature.

Alright, on to the References.

Reference Specification Type

To be honest, I wasn’t that surprised to find very little information on References in Javascript. After all, it’s not part of the language per se. References are only a mechanism, used to describe certain behaviors in ECMAScript. They’re not really “visible” to the outside world. They are vital for engine implementors, and users of the language don’t need to know about them.

Except when understanding them brings a whole new level of clarity.

Coming back to my original “obscure” example:

How do we know that 1st one’s this references foo, but 2nd one — global object (or undefined)?

Astute readers will rightfully notice — “well, the expression to the left of () evaluates to f, right after assignment; and so it’s the same as calling f(), making this function invocation rather than method invocation.”

Alright, and what about this:

“Oh, that’s just grouping operator! It evaluates from left to right so it must be the same as foo.bar(), making this reference foo

“Strange”

And how about this:

“Well… considering last example, it must be undefined as well then? There must be something about those parenthesis”

“Ok, I’m confused”

Theory

ECMAScript defines Reference as a “resolved name binding”. It’s an abstract entity that consists of three components — base, name, and strict flag. The first 2 are what’s important for us at the moment.

There are 2 cases when Reference is created: in the process of Identifier resolution and during property access. In other words, foo creates a Reference and foo.bar (or foo['bar']) creates a Reference. Neither literals — 1, "foo", /x/, { }, [ 1,2,3 ], etc., nor function expressions — (function(){}) — create references.

Here’s a simple cheat sheet:

Cheat sheet

Example Reference? Notes
"foo" No
123 No
/x/ No
({}) No
(function(){}) No
foo Yes Could be unresolved reference if `foo` is not defined
foo.bar Yes Property reference
(123).toString Yes Property reference
(function(){}).toString Yes Property reference
(1,foo.bar) No Already evaluated, BUT see grouping operator exception
(f = foo.bar) No Already evaluated, BUT see grouping operator exception
(foo) Yes Grouping operator does not evaluate reference
(foo.bar) Yes Ditto with property reference

Don’t worry about last 4 for now; we’ll take a look at those shortly.

Every time a Reference is created, its components — “base”, “name”, “strict” — are set to some values. The strict flag is easy — it’s there to denote if code is in strict mode or not. The “name” component is set to identifier or property name that’s being resolved, and the base is set to either property object or environment record.

It might help to think of References as plain JS objects with a null [[Prototype]] (i.e. with no “prototype chain”), containing only “base”, “name”, and “strict” properties; this is how we can illustrate them below:

When Identifier foo is resolved, a Reference is created like so:

and this is what’s created for property accessor foo.bar:

This is a so-called “Property Reference”.

There’s also a 3rd scenario — Unresolvable Reference. When an Identifier can’t be found anywhere in the scope chain, a Reference is returned with base value set to undefined:

As you probably know, Unresolvable References could blow up if not “properly used”, resulting in an infamous ReferenceError (“foo is not defined”).

Essentially, References are a simple mechanism of representing name bindings; it’s a way to abstract both object-property resolution and variable resolution into a unified data structure — base + name — whether that base is a regular JS object (as in property access) or an Environment Record (a link in a “scope chain”, as in identifier resolution).

So what’s the use of all this? Now that we know what ECMAScript does under the hood, how does this apply to this behavior, foo() vs. foo.bar() vs. (f = foo.bar)() and all that?

Function call

What do foo(), foo.bar(), and (f = foo.bar)() all have in common? They’re function calls.

If we take a look at what happens when Function Call takes place, we’ll see something very interesting:

Notice highlighted step 6, which basically explains both #1 and #2 from Crockford’s list of 4.

We take expression before (). Is it a property reference? (foo.bar()) Then use its base value as this. And what’s a base value of foo.bar? We already know that it’s foo. Hence foo.bar() is called with this=foo.

Is it NOT a property reference? Ok, then it must be a regular reference with Environment Record as its base — foo(). In that case, use ImplicitThisValue as this (and ImplicitThisValue of Environment Record is always set to undefined). Hence foo() is called with this=undefined.

Finally, if it’s NOT a reference at all — (function(){})() — use undefined as this value again.

Are you feeling like this right now?

Assignment, comma, and grouping operators

Armed with this knowledge, let’s see if if we can explain this behavior of (f = foo.bar)(), (1,foo.bar)(), and (foo.bar)() in terms more robust than “whatever is left of the dot”.

Let’s start with the first one. The expression in question is known as Simple Assignment (=). foo = 1, g = function(){}, and so on. If we look at the steps taken to evaluate Simple Assignment, we’ll see one important detail:

Notice that the expression on the right is passed through internal GetValue() before assignment. GetValue() in its turn, transforms foo.bar Reference into an actual function object. And of course then we proceed to the usual Function Call with NOT a reference, which results in this=undefined. As you can see, (f = foo.bar)() only looks similar to foo.bar() but is actually “closer” to (function(){})() in a sense that it’s an (evaluated) expression rather than an (untouched) Reference.

The same story happens with comma operator:

(1,foo.bar)() is evaluated as a function object and Function Call with NOT a reference results in this=undefined.

Finally, what about grouping operator? Does it also evaluate its expression?

And here we’re in for surprise!

Even though it’s so similar to (1,foo.bar)() and (f = foo.bar)(), grouping operator does NOT evaluate its expression. It even says so plain and simple — it may return a reference; no evaluation happens. This is why foo.bar() and (foo.bar)() are absolutely identical, having this set to foo since a Reference is created and passed to a Function call.

Returning References

It’s worth mentioning that ES5 spec technically allows function calls to return a reference. However, this is only reserved for host objects, and none of the built-in (or user-defined) functions do that.

An example of this (non-existent, but permitted) behavior is something like this:

Of course, the current behavior is that non-Reference is passed to a Function call, resulting in this=undefined/global object (unless bar was already bound to foo earlier).

typeof operator

Now that we understand References, we can take a look in few other places for a better understanding. Take, for example, typeof operator:

Here is that “secret” for why we can pass unresolvable reference to typeof and not have it blow up.

On the other hand, if we were to use unresolvable reference without typeof, as a plain statement somewhere in code:

Notice how Reference is passed to GetValue() which is then responsible for stopping execution if Reference is an unresolvable one. It all starts to make sense.

delete operator

Finally, what about good old delete operator?

What might have looked like mambo-jumbo is now pretty nice and clear:

Summary

And that’s a wrap!

Hopefully you now understand the underlying mechanism of References in Javascript; how they’re used in various places and how we can “utilize” them to explain this behavior even in non-trivial constructs.

Note that everything I mentioned in this post was based on ES5, being current standard and the most implemented one at the moment. ES6 might have some changes, but that’s a story for another day.

If you’re curious to know more — check out section 8.7 of ES5 spec, including internal methods GetValue(), PutValue(), and more.

P.S. Big thanks to Rick Waldron for review and suggestions!

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Refactoring single page app 31 Aug 2014 4:00 PM (11 years ago)

Refactoring single page app

A tale of reducing complexity and exploring client-side MVC

Skip straight to TL;DR.

Kitchensink is your usual behemoth app.

I created it couple years ago to showcase everything that Fabric.js — a full-blown <canvas> library — is capable of. We’ve already had some demos, illustrating this and that functionality, but kitchensink was meant to be kind of a general sandbox.

You could quickly try things out — add simple shapes or images or SVG’s or text; move them around, scale, rotate, delete, group, change colors, opacity; experiment with locking or z-index properties; serialize canvas into image or JSON or SVG; and so on.

And so there was a good old, single kitchensink.js file (accompanied by kitchensink.html and kitchensink.css) — just a bunch of procedural commands and conditions, really. Pressed that button? Add a rectangle to the canvas. Pressed another one? Load an image. Was object selected on canvas? Enable that button and update its text. You get the idea.

But things change, and over time the app grew and grew until the once-simple kitchensink.js became too big for its own good. I was starting to notice more and more repetition, problems with navigating and maintaining code. Those small weird glitches that live in the apps without authoritative data source; they came as well.

I was looking at a 1000+ LOC JS file, realizing it’s time to refactor.

But there was a bit of a pickle. You see, kitchensink is all about managing <canvas>, through Fabric, and frankly I had no idea how to best approach an app like this. If this was your usual “User” or “Collection” or “TodoItem” data coming from a server or elsewhere, I’d quickly throw together few Backbone.Model’s and call it a day. But with Fabric, we have an object model on top of <canvas>, so there’s just a collection of abstract visual objects and a way to operate on those objects.

Is it possible to shoehorn MVC onto all this? What exactly would become a model or the views? Is it even a good idea?

The following is my step-by-step refactoring path, including close look at some MVC-ish solutions. You can use it to get ideas on revamping your own spaghetti app, and/or to see how to approach design of <canvas>-based app, specifically. Each step is made as a separate commit in fabricjs.com repo on github.

Complexity vs. Maintainability

Before changing anything, I decided to do a little experiment and statically analyze complexity of an app. Not to tell me that it was in shitty state; that I already knew. I wanted to see how it changes based on different solutions.

There are few ways to analyze JS code at the moment. There’s complexity-report npm package, as well as jscomplexity.org (both rely on escomplex). There’s Plato that provides visual tracking of complexty (based on complexity-report). And there’s good old JSHint; it has its own cyclomatic complexity calculation.

I used complexity-report because it has more granular analysis and has this useful metric — “Maintainability”. What exactly is it and why should we care about it?

Here’s a simple example:

This chunk of code has cyclomatic complexity (CC) of 1. It’s just a single function call. No conditional operators, no loops. Yet, it’s pretty scary (actual code from Fabric.js, btw; shame on me).

Now look at this code:

It also has cyclomatic complexity of 1. But it’s clearly significantly easier to understand and maintain.

Maintainability, on the other hand, is reported as 151 for the first one and 159 for the second (the higher the better; 171 being the highest). It’s still not a big difference but it’s definitely more representative of overall state of the code, unlike cyclomatic complexity.

complexity-report defines maintainability as a function of not just cyclomatic complexity but also lines of code and overall volume of operators & operands (effort):

Suffice it to say, it gives more accurate picture of code simplicity and maintainability.

Vanilla JS

It all started with this one big, linear, 1057 LOC JS file. Kitchensink never had any complex DOM/AJAX interactions or animations, so I never even used jQuery in there. Just a plain vanilla JS.

Introducing jQuery

I started by porting all existing DOM interactions to jQuery. I wasn’t expecting any great improvements in code; jQuery can’t help with architectural changes. But it did remove some repetitive code - class handling, element removal, event handling, etc.

It would have also provided good foundation for further improvements, in case I decided to go with Backbone or any other higher-level tools.

Notice how it shaved off ~50 lines of code and even improved complexity from 132 to 116 (mainly removing some DOM handling conditions: think toggleClass, etc.).

Backbone? Angular? Ember?

With easy stuff out of the way, I tried to figure out what to do next. I’ve used Backbone in the past, and I’ve been meaning to try out Angular and/or Ember — 2 of the most popular higher-level solutions. This would be a perfect way to learn them.

Still unsure of how to proceed, I decided to do something very simple. Instead of figuring out which library is the be-all-end-all, I went on to fix the most obvious issue — tight coupling of view logic and all the other (let’s call it “business”) logic.

Separating presentation and “business” logic

I broke kitchensink.js into 3 files: model.js, view.js, and utils.js.

Utils were just some language-level methods used by the app (like getRandomColor or supportsColorpicker). View was to contain purely UI code, and it would reach out to model.js for any state or actions on that state. I called it model.js but really it was a combination of model and controller. The bottom line was that it had nothing to do with the presentation logic.

So this kind of mess (in previous code):

was now separated into view concern:

and model/controller concern:

Separating presentation logic from everything else had a dramatic effect on the state of the app.

Yes, there was an inevitable increase in lines of code (714 -> 829) but both complexity and maintainability skyrocketed. Overall CC went from 116 to 98, but more importantly, it was significantly less per-file. The biggest chunk was now in the model (cc=70) and the view became thin and easy to follow (cc=26).

Maintainability rose from 104 to ~125.

Introducing convention

Looking at the code revealed few more possible optimizations. One of them was to use convention when enabling/disabling buttons representing canvas object actions. Instead of keeping references to them in the code, then disabling/enabling them through those references, I gave them all specific class name (“btn-object-action”) right in the markup, then toggled their state with the help of jQuery.

The changes weren’t very impressive, but complexity of the view went down from 26 to 21, SLOC went from 829 to 805. Not bad.

Backbone

At this point, the app complexity was concentrated in model/controller file. There wasn’t much I could do about it since it was all pure “business” logic: creating objects, manipulating objects, keeping their state, etc.

However, there was still some room for improvement in the view corner.

I decided to start with Backbone. I only needed a fraction of its capabilities, but Backbone is relatively “lean” and provides a nice, declarative abstraction of certain common view operations, such as event handling. Changing plain kitchensink view object to Backbone.View would allow to take advantage of that.

Instead of assigning event handlers manually, there was now this:

First application of (Backbone-esque) MVC

At the same time, model-controller was now implemented as Backbone.Model and was letting views know when to update themselves. This was an important change towards a different architecture. View was now observing model-controller for changes, and re-rendering itself accordingly. And model-controller fired change event whenever something would change on canvas itself.

In model-controller:

Remember I mentioned abstract canvas state and interactions?

Notice the bridge between canvas and model/controller object: “object:selected”, “object:added”, “selection:cleared” canvas/Fabric events were all forwarded as controller’s “change” one.

In view:

As an example, now when user selected an object canvas, model-controller would trigger change event and view would re-render itself. Then during render, view would ask model-controller — is there active object? — and depending on an answer, render corresponding buttons in either enabled or disabled state, with one text or the other.

This felt like a good improvement in the right direction, architecture-wise.

Views became more declarative and easier to follow. SLOC went down a bit (787 -> 772), and view complexity was now even less (from 21 to 16). Unfortunately, maintainability of model went slightly down.

Backbone.unclassified

Backbone made views more declarative, but there was still some repetition I wasn’t happy with:

Notice how “#lock-horizontally” selector is repeated twice. This is bad both for maintainance (my main concern) and performance. In the past, I’ve used a tiny backbone.unclassified extension to alleviate this problem, and so I went with it again:

Notice how we create an “identifier” for an element in ui “map”, and then use that identifier in both events “map” and in the rendering code.

This made views even more declarative, albeit at the expense of slightly more cruft overall. Complexity and maintainability stayed more or less the same.

Breaking up the view

The KitchensinkView was already clean and beautiful. Half of it was simple declarative one-liners (clicked this button? call that model method) and the rest was pretty simple and linear rendering logic.

But there was something else.

Entire view logic/rendering of an app was stuffed in one file. The declarative “events” hash, for example, was spanning ~200 lines and was becoming daunting to look through. More importantly, this one file included multiple concerns: object controls section, section for adding objects, global canvas controls section, text handling section, and so on. Yes, these are all view concerns but they’re also logically distinct view concerns.

What to do? Break it into multiple views!

The code size obviously increased once again, but look what happened with views maintainability. It went from 132 to 145! A significant and expected improvement.

Of course I didn’t need complexity report to tell me that things got better. I was now looking at 5 beautiful concise view files, each with its own rendering logic and behavior. As a nice side effect, some of the views (e.g. AddCommandsView) became entirely declarative.

2-way binding

At this point, I was fully satisfied with the way things turned out.

Backbone (with unclassified extension) and multiple views made for a pretty clean app. Backbone felt almost perfect here as there was none of the more complicated logic of nested views/collections, animations/transition, routing, etc. I knew that adding new functionality or changing existing one would be straightforward; multiple views meant easy scaling and easy addition of new ones.

What could possible be better…

Determined to continue further and see where it takes me, I took another look at the views:

This is ObjectControlsView and I’m only showing 2 functionalities here: lock toggling button and opacity slider. Notice how both of their behavior have something in common. There’s event (“click” or “change”) that maps to a model action, and then there’s rendering logic — updating button text or updating slider value.

Don’t you find the cruft inside render just a bit too repetitive and unnecessary? Wouldn’t it be great if we could just update “opacity” or toggle lock value on a model, not caring about rendering of corresponding control? So that opacity slider automatically knew to update itself, once opacity on a model changed. Ditto for toggling button.

Did someone say… data binding?

Of course! I just had to see what introducing data-binding would do to an app. Unfortunately, Backbone doesn’t have it built-in, unlike other MV* solutions — Knockout, Angular, Ember, etc.

I wanted to stick to Backbone for now, instead of trying something completely different, which meant using an addon of some sort.

backbone.stickit

I tried backbone.stickit first, but couldn’t get it to work at all with kitchensink’s model/controller methods.

You see, binding view to a regular Backbone model is easy with “stickit”. Just define a hash with selector ↔ attribute mapping:

Unfortunately, our model is <canvas>-based and all the state needs to be set & retrieved via a proxy. This means using methods, not properties.

We can’t just map opacity slider to “opacity” attribute on a model. We need to map it to canvas.getActiveObject().opacity (possibly checking that getActiveObject() returns object in the first place) via custom getters/setters.

Epoxy

Next there was Epoxy.js, which defines bindings like so:

Again, easy with plain attributes. Not so much with methods. I tried to implement it via computed properties but without much success.

Rivets.js

Next there was Rivets.js and as I was expecting another painful “adaptation”, it surprisingly just worked outside of the box!

Rivets turned out to be pretty low-level, but also very flexible. Docs quickly revealed how to use methods instead of properties. The binding could be initialized like so:

And the markup would then be parsed for any “rv-…” attributes (prefix could be changed). For example:

The great thing was that I could just write app.getBgColor and it would call that method on kitchensink since that’s what was passed to rivets.bind() as an app. No limitations of only working with Backbone.Model attributes. While this worked for one-way binding, with 2-way binding (where view also needs to update the model), I would need to write custom adapter…

It sounded daunting but turned out rather straighforward:

Now, I could add this in markup (notice the use of special ^ separator, instead of default .):

..and it would use a nice convention of calling getCanvasBgColor as a getter and setCanvasBgColor as a setter, when changing the colorpicker value.

There was no longer a need for manual (even if declarative) event listeners:

Downsides of markup-based bindings

I didn’t exactly like this whole setup.

I’d prefer to have bindings right in the code, to have a “birds-view” understanding of which elements map to which behavior. It would also be easier and more understandable to map multiple elements. If I wanted a set of buttons to toggle their enabled/disabled state according to certain state of canvas — and I did want that — I couldn’t just do something like:

I had to write custom binder instead, and that’s certainly more obscure and harder to understand. Speaking of custom binders…

Rivets makes it easy to create them. Binders are those “rv-…” directives we saw earlier. There’s few built-in ones — “rv-value”, “rv-checked”, “rv-on-click” — and it’s easy to define your own.

In order to toggle buttons state, I wrote this simple 1-way binder:

It was now possible to use “rv-enable” on a parent element to enable or disable descendant buttons:

But imagine reading unknown markup like this, trying to understand which directive controls what, and how far it spans

Another binder I added was “rv-val”, as an alternative to “rv-value” (with the exception of observing “keyup” rather than “change” event on an element):

You can see that adding binders is simple, they’re easy to read, and you can even reuse existing behavior (rivets.binders.value.routine in this case).

Finally, there’s a convenient support for formatting, which was just perfect for changing toggleable text on some elements:

Notice how “rv-text” contents include | toggle smth smth. This is a custom formatter, defined like this:

The button text was now determined according to app^horizontalLock (which desugars to app.getHorizontalLock()) and when passed to toggle formatter, would come out either as one or the other. Unfortunately, formatter falls a bit short; it seems that its values can’t be strings, which makes things much less convenient.

Unlike with behavior, keeping alternative UI text directly in HTML felt perfect. Text stays where text should be — in markup; it makes localization easier; it’s easy to follow.

On the other hand, I didn’t like keeping model/controller actions right in the markup:

It’s especially bad when some of the view behavior is somewhere in a JS-based view/controller, and some — in the markup. YMMV.

So what happened to the code?

After moving app logic from JS views to HTML (via Rivets’ “rv-“ attributes), all that was left from the views were these 3 lines:

Amazing, right? Or not so much?

Yes, we practically eliminated JS-based view, moving logic/behavior to markup and/or model-controller. But let’s look at the stats:

There was now additional (32 SLOC) data_binding_adapter.js file which included all the customizations and additions for Rivets.js. Still, there was a dramatic reduction of SLOC (830 -> 715); expected, since a lot of logic was moved to the markup. View’s maintainability was still ~145 but model-controller surprisingly went from 116 to 125! Even though more code moved to model-controller, that code was now simpler — usually a pair of getter/setter’s for particular state.

So how does this compare to the very first step — a monolythic spaghetti code?

Improvement across the board. And what about HTML, where so much logic was moved to?

Ok, 100 lines longer, and only 3KB heavier. Doesn’t seem too bad.

But was this really an improvement? All the HTML declarations and all the abstraction felt like 1 step forward, 2 steps back. It seemed harder to understand and likely harder to maintain. While complexity tool showed improvement, it was only improvement on JS side, and of course it couldn’t give holistic analysis.

I wanted to take a step back and try something else.

Breaking up controller

Aside from markup contamination, the problem was model-controller becoming too fat; that one file that was still sitting at 70 complexity.

What if I could keep Rivets.js for now, but break model-controller into multiple controllers, each for distinct behavior. And a very thin model would serve as a proxy between <canvas> and controller actions. After some experimentation and pondering on a best way to organize something like that, I ended up with this:

The model was now <canvas> itself! There were no JS-based views, and all the logic was in 5 distinct controllers. But how was this possible? Shouldn’t canvas actions go through some kind of proxy to normalize all the canvas.getActiveObject().{get|set}Something() voodoo? Yes, it was still needed, but all the proxying was now happening in controller itself.

I created CanvasController, inheriting from Backbone.Model (to have event managing), and gave it very minimal generic behavior (getStyle, setStyle, triggerChange). Those methods are what served as proxy between canvas and controllers. Controllers implemented specific getters/setters via those methods (inherited from a parent CanvasController “class”).

How did this all look complexity-wise?

SLOC stayed the same but what happened to complexity? Not only it went down to total of 68, the max complexity per file was now only 18! There was no longer a big business logic file cc=70, but small controller files with cc<=20. Definitely an improvement.

Unfortunately, maintainability went slightly down (to 128), likely due to all the additional cruft.

Even though this was the best case complexity-wise, I still wasn’t too happy with this solution. There were still bindings in HTML and canvas controllers felt a bit too overly abstracted (i.e. it would take some time to understand how app works, how to change or extend it).

Angular

Muliple controllers reminded me of what I’ve seen in Angular tutorials. It seemed natural to try and see how Angular compares to the last (Backbone + Rivets) solution, since it looked so similar.

Angular learning curve is definitely steeper. It took me ~2 days to understand and get comfortable with Rivets data-binding. It took ~2 weeks to understand and get comfortable with Angular data-binding (watches, digest cycle, directives, etc.).

Overall, implementing kitchensink via Angular felt very similar to Backbone + Rivets combo. But, as with everything, there were pros and cons.

The Good

In Angular, there’s no need to Function#bind methods to a model (when calling them from within attribute values). For example, rv-on-click="app.foo" calls app.foo() in context of element, whereas Angular’s ng-click="foo()" calls foo in context of $scope. This proves to be more convenient.

Using the same example of rv-on-click="app.foo" vs. ng-click="foo()", braces after name make it more clear that it’s a function call.

Function calls are also more concise. For example, rv-show="app.getSelected" vs. ng-show="getSelected()". There’s no need to specify app since getSelected is looked up automatically on $scope.

Mostly syntactic preference, but <button>{{ ... }}</button> (in Angular) is easier to read/understand than <button rv-text></button>.

The not so Good

The biggest drawback was getting started and understanding how to plug kithchensink’s unique “model” into Angular. I was also unlucky to have run into an issue with {{ … }} conflicting with Jekyll’s {{ … }}. Took quite some time to figure out why in the world Angular was not “initializing”…

It’s a bit annoying that Angular’s methods start with $ and “interfere” with a common convention of referencing jQuery objects via $xxx. Just a minor additional cognitive burden if you’re used to that notation.

There were some minor things like Angular’s $element.find() limiting lookup by tagName even when jQuery was available. Weird.

Most importantly, custom 2-way binding was non-trivial, unlike with Rivets documentation which made it very clear. With Angular, it’s pretty much impossible to use custom accessors in attribute values. We can’t do that elegant Rivets trick of app^selected desugaring to app.getSelected() and app.setSelected(). Of course Angular’s directives kind of solve this, but it’s not the same.

Why? Because in Rivets, you can plug this custom adapter anywhere, including Rivet’s “native” binders!

Take this radio group, use built-in rv-checked attribute, and it just works:

This can not be done in Angular, and so we need to implement our own “radio group” handling via directive. Directives are somewhat similar to Rivets’ ones, although of course much more powerful.

To implement accessors, I created bindValueTo directive, to be used like this:

Now, slider would call getFontSize() to retrive the value, and setFontSize(value) to set it. Once I understood directives, it was fairly straightforward:

Notice the additional $element[0].type === 'radio' branch for that radio group case I mentioned earlier.

Clarity vs. Abstraction

When it comes to Angular, I feel it’s important to strike a balance between abstraction & clarity. Take this toggle button, for example. A common chunk of functionality in kitchensink, used a dozen times.

Now, this is a fairly understandable piece of markup — putting the issue of mixed content/behavior aside — accessor methods toggling the state, element class and text updating accordingly. Yet, this is a common functionality. So to avoid repetition, it could be abstracted away into its own directive.

Imagine it being written like this:

Certainly cleaner and easier to read, but is it easier to understand? As with any abstraction, there’s now an extra level underneath, so it’s not immediately clear what’s going on.

So how did porting to Angular affect complexity/maintainability scores?

Comparing to previous Backbone/Rivets combo, SLOC went from 715 to 660. Complexity — from 68 to 65, and maintainability — from 128 to 126. Interesting.

The reduction in SLOC was expected, knowing Angular’s nature of controller “entrees” right in markup. Complexity and maintainability, on the other hand, practically stayed the same.

HTML size

If you’re wondering how this refactoring affected size of the main HTML file, the picture is very simple and straightforward.

As expected, it’s been continuously growing little by little, with the spike from markup-based solutions like Rivets and Angular. Curiously, while Angular resulted in higher SLOC, it was actually less KB comparing to Rivets.

To summarize

Unfortunately, other MV* libraries (Ember, Knockout, etc.) didn’t make it into my exploration. I was constrained on time, and I’ve already came to a much more maintainable solution. I do hope to try something else in the near future. It’ll be interesting to see how yet another concept ties into the app. Stay tuned for part 2.

My final conclusion was that Backbone+Rivets and Angular provided relatively similar benefits, with almost exact complexity/maintainability scores, and only different distribution of logic (attributes in markup vs. methods in JS “controller”). The pros/cons I mentioned earlier are what constituted the main difference.

TLDR and Takeaway points

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

HTML minifier revisited 27 Jul 2014 4:00 PM (11 years ago)

HTML minifier revisted

4 years ago I wrote about and released HTMLMinifier. Back then, there were almost no tools for proper HTML minification; unless you considered things like “Absolute HTML Compressor” for Windows 95/98/XP/2000 or Java-based HTMLCompressor.

I haven’t been working on it all that much, but occasionally would add a feature, fix a bug, add some tests, refactor, or pull someone’s generous contribution.

Fast forward to these days, and HTMLMinifier is no longer a simple experimental piece of Javascript. With over 400 tests, running on Node.js and packaged on NPM (with 120+ dependents), having CLI, grunt/gulp modules, benchmarking suite, and a number of improvements over the years, it became a rather viable tool for someone looking to squeeze the most out of front-end performance.

Seeing how minifier gained quite few new additions over the years, I thought I’d give a quick rundown of what changed and what it’s now capable of.

Better HTML5 conformance

We still rely on John Resig’s HTML parser but it is now heavily tweaked to conform to HTML5 and to provide more flexible parsing.

A common problem was inability to “properly” recognize block elements within inline ones.

This was not allowed in HTML4 but is now OK in HTML5.

Another issue was with custom elements (e.g. <my-component>test</my-component>). While, technically, not part of HTML5, browsers do tolerate such cases and so does minifier.

Keeping closing slash and case sensitivity (XHTML, SVG, etc.)

Two other commonly requested features were keeping end tag closing slash and case-sensitivity. Both of these are useful when minifying SVG (or XHTML) documents. Having HTML4 parser at heart, and considering that in 99% of the cases trailing slashes serve no purpose, minifier would always drop them from the output. It still does, but you can now turn this behavior off.

Ditto for case-sensitivity — there’s an option for those looking to have finer control.

Ignoring custom comments and <!–!

With the rise of client-side MVC frameworks, HTML comments became more than just comments. In Knockout, for example, there’s a thing called containerless control flow syntax, where you can have something like this:

It’s useful to be able to ignore such comments, while removing “regular” ones, so minifier now allows for exactly that:

Relatedly, we’ve also added support for generic ignored comments — those starting with <!--!. You might recognize this pattern from de-facto standard among Javascript libraries — comments starting with /*! are ignored by minifiers and are often used for licenses.

If you’d like to ignore an entire chunk of markup from minification, you can now simply wrap it with <!-- htmlmin:ignore --> and it’ll stay untouched.

Finally, we now ignore anything surrounded by <%...%> and <?...?> which is often useful when working with server-side templates, etc.

Custom attributes

Another bastardization twist on your regular HTML we can see in client-side MVC frameworks is non-standard attribute names, values and everything in between.

Example of Handlebars’ dynamic attributes:

Most of the HTML4/5 parsers will fail here, choking on { in {{#if as an invalid attribute name character.

We worked around this by adding support for customAttrSurround option, in which you can specify an array of regexes to match anything surrounding attributes:

But wait, there’s more! Attribute names are not the only offenders.

Here’s an example from Polymer; notice ?= as an attribute assignment characters:

Only few days ago we’ve added support for customAttrAssign option, similar to customAttrSurround (thanks Duncan Beevers!), which you can call like so:

Scripts as templates

Continuing on the topic of MVC frameworks, we’ve also added support for an often-used pattern of scripts-as-templates:

AngularJS:

Ember.js

There’s no reason not to minify contents of such scripts, and you can now do this via processScripts directive:

JS/CSS minification

Now, what about “regular” scripts?

We decided to go a step further, providing a way to minify contents of <script> elements and event handler attributes (“onclick”, “onload”, etc.). This is being delegated to an excellent UglifyJS2.

CSS isn’t left behind either; we can now pass contents of style elements and style attributes through clean-css, which happens to be the best CSS compressor at the moment.

Both of these features are optional.

Conservative whitespace collapse

If you’d like to play it safe and make minifier always leave at least 1 whitespace where it would otherwise completely remove it, there’s now an option for that — conservativeCollapse.

This could come in useful if your page layout/rendering depends on whitespace, such as in this example:

Minifier doesn’t know that input-preceding element is rendered as inline-block; it doesn’t know that whitespace around it is significant. Removing whitespace would render checkbox too close (squeeshed) to a “label”.

This is when “conservativeCollapse” (and that extra space) comes in useful.

Max line length

Another recently-introduced customization is maximum line length. An interesting use case is that some email servers automatically add a new line after 1000 characters, which breaks (minified) HTML. You can now specify line length to add newlines at valid breakpoints.

Benchmarks

We also have a benchmark suite now that goes over a number of “source” files (front pages of popular websites), minifies them, then reports size comparison and time spent on minification.

How does HTMLMinifier compare [1] to the other solutions out there (Will Peavy’s online minifier and a Java-based HTMLCompressor)?

Site Original size (KB) HTMLMinifier (KB) Will Peavy (KB) htmlcompressor.com (KB)
HTMLMinifier page 48.8 37.3 43.3 41.9
ES6 table 117.9 79.9 92 91.9
MSN 156.6 133 145 138.3
Stackoverflow 200.4 159.5 168.3 163.3
Amazon 245.9 206.3 225 218.5
Wikipedia 401.4 380.6 396.3 n/a
Eloquent Javascript 869.5 830 872 n/a

Not too bad!

Notice remarkable savings (~40KB) on large static files such as a one-page Eloquent Javascript.

Future plans

Minifier has come a long way, but there’s always room for improvement.

There’s few more bugs to squeesh and few features to add. I also believe there’s more optimizations we could perform to get the best savings — whether it’s reordering attributes to aid gzip compression or more aggressive content removal (spaces, attributes, values, etc.).

One concern I have is how long it takes to minify large (500KB+) files. While it’s unlikely that someone uses minifier in real-time (rather, as a one time compilation step) it’s still unacceptable for minification to take more than 1-2 minutes. This is something we could try fixing in the future.

We can also monitor performance stats — both size (as well as gzipped?) and time taken — on each commit, to get a good picture of whether things change for the better or worse.

As always, I welcome you to try minifier in your projects, report any bugs/suggestions, and help with whatever you can. Huge thanks goes to all the contributors without whom we wouldn’t have come this far!

[1] Benchmarks performed on OS X 10.9.4 (2.3GHz Core i7).

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

JSCritic 26 Mar 2014 4:00 PM (11 years ago)

JSCritic

Choosing a good piece of Javascript is hard.

Every time I come across a newly-released, shiny plugin or library, I wonder what’s going on underneath. Yes, it looks pretty and convenient but what does underlying code look like? Does it browser sniff or extend the DOM? Does it pollute global scope? What about compatibility with older browsers; could it be that it utilizes, say, ES5 getters/setters making it unusable in IE<9?

I always wished there was a way to quickly check how well a certain script behaves. Not like we did back in the days.

The best thing for a code quality test like this is undoubtedly through JSHint [1]. It can answer most of those questions and many more. Unfortunately, “many more” part is a bit of a problem. Plugging a script code into jshint.com usually yields tons of issues, not just with browser compatibility or global variables but also code style. These checks are a must for your own scripts, but for 3rd party code, I don’t really care about missing semicolons (despite my love of them), whether constructors begin with uppercase, or if assignments happen in conditional statements. I only wish to know how well a script behaves on the outside. Now, a sloppy code style can certain be an indication of a bad quality of script overall. But more often than not it’s a preference not a problem.

Few days ago, I decided to hack something together; something simple, that would allow me to quickly plug the script and see a big picture.

So I made JSCritic.

Plug in script code and it answers some of the more pressing questions.

I tried using Esprima at first, but quickly realized that most of the checks I care about are already in JSHint. So why not piggy back on that? JSCritic turned out to be a simple wrapper on top of it. I originally wrote it in Node, to be able to pass it filename and quickly see the results, then ported it to run in a browser.

You can still run it in both.

Another thing I wanted to see is minified script size. Some plugins have minified versions, some don’t, some use better minifiers, some worse. I decided to minify content through UglifyJS — a de facto standard of minification at the moment — to get an objective overview of code size. Unfortunately, browser version of UglifyJS seems to be choking more often than Node one, so it might be safer to use the latter.

I have to say that JSCritic is more of a prototype at the moment. Static analysis has its limitations, as well as JSHint. I haven’t had much time to polish it, but hoping to improve in the near future or with the help of ever-awesome contributors. One thing to emphasize is that for best results you should use non-minified source code (you’ll see exactly why below)!

If you want to know more about tests, implementation details, and drawbacks, read on. Otherwise, hope you find it as useful as I do.

Global variables

Let’s first take a look at global variables detection. Unfortunately, it seems to be very simplistic in JSHint, failing to catch cases other than plain variable/function declarations in top level code.

Here it catches foo, bar, and qux as expected, but fails with all of these:

Granted, detecting globals via static analysis is hard. A more robust solution would be to actually execute code and compare global object “signature”, just like I did in detect-global bookmarklet back in 2009 (based on a script by Remy Sharp). Unfortunately, executing script is also not always easy, and global properties could be exported from various places (e.g. methods that need to be called explicitly); we have no idea which places those are.

Still, JSHint catches a good number of globals and accidental leaks like these:

It gives a decent overview, but you should still look through variables carefully as some of them might be false positives. I’m hoping this will be made more robust in the future JSHint versions (or we could try using hybrid detection method — both via static analysis and through global object signature).

Extended natives

Detecting native object extensions has few limitations as well. While it catches both Array and String in example like this:

..it fails with all of these:

As you can see, it’s also simplistic and could have false negatives. There’s an open JSHint issue for this.

eval & document.write

Just like with other checks, there are false positives and false negatives. Here’s some of them, just to give an idea of what to expect:

and with document.write:

Browser compatibility

I included 3 checks for browser/engine compatibility — Mozilla-only extensions (let expressions, expression closures, multiple catch blocks, etc.), things IE chokes on (e.g. extra comma), and ES6 additions (array comprehensions, generators, imports, etc.). All of these things could affect cross-browser support.

Browser sniffing

To detect browser sniffing, we first check statically for occurance of navigator implied global (via JSHint), then check source for occurance of navigator.userAgent. This covers a lot of cases, but obviously won’t catch any obscurities, so be careful. To make things easier, a chunk of code surrounding navigator.userAgent is pasted for expection purposes. You can quickly check what it’s there for (is it for non-critical enhancement purposes or could it cause subtle bugs and/or full breakage?)

Unused variables

Finally, I included unused variables check from JSHint. While not exactly an indication of external script behavior, seeing lots of those could be an indication of sloppy (and potentially buggy) code. I put it all the way at the end, as this is the least important check.

So there it is. The set of rules can definitely be made larger (does it use ES5 features? does it use browser-sniffing-like inference? does it extend the DOM?) and more accurate in the future. For now you can use JSCritic as a quick first look under the hood.

[1] and perhaps ESLint, but I haven't had a chance to look into it.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

State of function decompilation in Javascript 13 Jan 2014 3:00 PM (11 years ago)

State of function decompilation in Javascript

It’s always fun to see something described as “magic” in Javascript world.

One such example I came across recently was AngularJS dependency injection mechanism. I’ve never been familiar with the concept, but seeing it in practice looked clever and convenient. Not very magical though.

What is it about? In short: defining required “modules” via function parameters. Like so:

Notice the $scope, $timeout, $http identifiers.

Aha. So instead of passing them as strings or vars or whatever, they’re defined as part of the source. And of course to “read” the source there could only be one thing involved…

Function decompilation.

The kind that we used in Prototype.js to implement $super back in 2007? Yep, that one. Later making its way to Resig’s simple inheritance (used in a safe fashion) and other places.

Seeing a modern framework like Angular use function decompilation got me surprised. Even though it wasn’t something Angular relied on exclusively, this black magic has been somewhat frowned upon for years. I wrote about some of the problems associated with it back in 2009.

Something so inherently non-standard and so varying among implementations could only be compared to user agent sniffing.

But is it, really? Could it be that things are not nearly as bad these days? I last investigated this 4 years ago — a significant chunk of time. Could it be that implementations came to some kind of unification, when it comes to function string representation? Am I completely outdated?

Curious, I decided to take a look at the current state of affairs. Could function decompilation be relied on right now? What exactly could we rely on?

But first..

Theory

To put simply, function decompilation is the process of accessing function code as a string (then parsing its body or extracting arguments or whatever).

In Javascript, this is done via toString() of function objects, so fn.toString() or String(fn) or fn + '' or anything else that delegates to Function.prototype.toString.

The reason this is deemed unreliable in Javascript is due to its non-standard nature. A famous quote from ES5 spec states:

15.3.4.2 Function.prototype.toString( )

An implementation-dependent representation of the function is returned. This representation has the syntax of a FunctionDeclaration. Note in particular that the use and placement of white space, line terminators, and semicolons within the representation String is implementation-dependent.

Of course when something is implementation-dependant, it’s bound to deviate in all kinds of ways imaginable.

Practice

..and it does. You would think that a function like this:

.. would be serialized to a string like this:

And it almost does. Except when some engines omit newlines. And others omit comments. And others omit “dead code”. And others include comments around (!) function. And others hide source completely…

Back in the days, things were really bad. Safari <=2.x, for example, didn’t even conform to valid Function Declaration syntax. It would go wild with things like “(Internal Function)” or “[function]” or drop identifiers from NFE’s, just because.

Back in the days, some of the mobile browsers (Blackberry, Opera Turbo) hid the code completely (replacing it with polite “** /* source code not available */ **” comment instead or similar), supposedly to “save” on memory. A fair optimization.

Modern days

But what about today? Surely, things must have gotten better. There’s a convergence of engines, domination of relatively sane WebKit, lots of standardization, and tremendous increase in engines performance.

And indeed, things are looking good. But it’s not nearly all nice and peachy yet, and there’s more “fun” on the horizon.

I made a simple test page, checking various cases of functions and their string representations. Then tested it on desktop browsers, including pretty “old” ones (IE6+, FF3+, Safari4+, Opera 9.6+, Chrome), as well as slew of mobiles and looked at common patterns.

Decompilation purpose

It’s important to understand different purposes of function decompilation in Javascript.

Serializing native, built-in functions is different from serializing user-defined ones. In case of Angular, for example, we’re talking about user-defined function, so we don’t have to concern ourselves with the way native functions are serialized. Moreover, if we’re talking about retrieving arguments only, there’s definitely less deviations to deal with; unlike if we wanted to “parse” the source code.

Some things are more reliable; others — less so.

User-defined functions

When it comes to user-defined functions, things are pretty uniform.

Aside from oddball and dying environments like IE<9 — which sometimes includes comments (and even parens) around functions in their string representation — or Konqueror, that omits function body brackets from new Function -generated functions.

Most of the deviations are in whitespace (and newlines). Some browsers (e.g. Firefox <17) strip all comments from source code, and remove “dead”, unreachable code.

But don’t get too excited as we’ll talk about what future holds in just a bit…

Function constructor

Things are also a bit hectic in generated functions (using new Function(...)) but not much. While most of the engines create function with “anonymous” identifier, the spacing and newlines are inconsistent. Chrome also inserts extra comment after parameters list (extra comment never hurts, right?).

becomes:

Bound functions

Every single supporting engine that I’ve tested represents bound (via Function.prototype.bind) functions the same way as native functions. Yes, that means bound functions “lose” their source from string representation.

Arguably this is a reasonable thing to do; although a bit of a “wat?” when you first see it — why not use “[bound code]” instead?

Curiously, some engines (e.g. latest WebKit) preserve function’s original identifier and some don’t.

Non-standard

What about non-standard extensions? Like Mozilla’s expression closures.

Yep, those are still represented as they’re written; without function body brackets (technically, a violation of Function Declaration syntax, which MDN page on Function.prototype.toString doesn’t even mention; something to fix!).

ES6 additions

I was almost done writing a test case, when a sudden thought crossed my mind. Hold on a second… What about EcmaScript 6?!

All those new additions to the language; new syntax that changes the way functions look — classes, generators, rest params, default params, arrow functions. Won’t they affect function representation as well?

Quick test shed the light — they do. Of course. Firefox 24+, leading ES6 brigade, reveals string representation of these new constructs:

Examining ES6 spec confirms this further:

An implementation-dependent String source code representation of the this object is returned. This representation has the syntax of a FunctionDeclaration, FunctionExpression, GeneratorDeclaration, GeneratorExpession, ClassDeclaration, ClassExpression, ArrowFunction, MethodDefinition, or GeneratorMethod depending upon the actual characteristics of the object. In particular that the use and placement of white space, line terminators, and semicolons within the representation String is implementation-dependent.

If the object was defined using ECMAScript code and the returned string representation is in the form of a FunctionDeclaration FunctionExpression, GeneratorDeclaration, GeneratorExpession, ClassDeclaration, ClassExpression, or ArrowFunction then the representation must be such that if the string is evaluated, using eval in a lexical context that is equivalent to the lexical context used to create the original object, it will result in a new functionally equivalent object. The returned source code must not mention freely any variables that were not mentioned freely by the original function’s source code, even if these “extra” names were originally in scope. If the source code string does meet these criteria then it must be a string for which eval will throw a SyntaxError exception.

Notice how ES6 still leaves function representation implementation-dependent although clarifying that it no longer conforms to just FunctionDeclaration syntax. Also notice an interesting additional requirement — “returned source code must not mention freely any variables that were not mentioned freely by the original function’s source code” (bonus points if you understood this in less than 7 tries).

I’m unclear on how this will affect future engines and their representation. But one thing is certain. With the rise of ES6, function representation is no longer just an optional identifier followed by parameters and function body. There’s a whole lot of new stuff coming.

Regexes will, once again, have to be updated to account for all the changes (did I say it’s similar to UA sniffing? hint, hint).

Minifiers & Preprocessors

I should also mention couple of old chestnuts that never quite sit well with function decompilation — minifiers and preprocessors.

Minifiers like UglifyJS, and preprocessors/compilers like Caja tend to tweak the hell out of source code and rename parameters. This is why Angular’s dependency injection doesn’t work with minifiers unless alternative methods are used.

Perhaps not a big deal, but still a relevant issue and definitely something to keep in mind.

TL;DR & Conclusions

To sum things up: it appears that function decompilation is becoming safer but — depending on your parsing needs — it might still be unwise to rely exclusively on.

Thinking to use it in your app/library?

Remember that:

P.S. Functions with overwritten toString methods and/or Proxy.createFunction are a different kind of beast; we can consider those a special case that would require a special consideration.

Special thanks to Andrea Giammarchi for providing some of the mobile tests (not available on BrowserStack).

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?