Book highlights: The Filter Bubble by Eli Pariser

This time I write about a book by Eli Pariser first appeared in 2011. It title points to its main content: The filter bubble. How the new personalized web is changing what we read and how we think.

As it were real future-telling, the author, already in 2011, prepares the reader to understand the perils of web personalization and its potential consequences. Now, in 2017, those consequences have materialised.

Let's remember that an interesting part of Information Security is Personal Data Privacy (it that still exists!).

As always, little disclaimer, this collection of learning points do not replace the reading of the book and they constitute a very personal list of items. Let's start:

- The arrival of personalised Internet search by Google in 2009 contributed to make the user of that search a real product rather than a customer.

- The delivery of personalised search results creates, for each of us, a personal bubble in which we will live on. This is great in terms of confirming our interests, however this is not so great in terms of isolating each of us within our own bubble and system of beliefs.

- Different point but also worth highligthing: Asymmetry in email. The cost of sending an email is orders of magnitude lower than the cost of receiving and reading an email (in terms of human time devoted to it). This is the main reason why email spam exists.

- Facebook focuses on relationships among people and Google on relationships in data.
- Facebook focuses on what you share, Google on what you click.
- Both aim the same final objective: User (product) lock-in.

- The author also talks about user behaviour as a commodity and how some companies monetise that e.g. acxiom.

- Interesting fact: Google News was created as an initially easy way to curate news after 9/11.

- A fact: More voices means less trust in a given voice.
- In the US in 2011 people watch TV on average 36 hours per week.
- Definition of TV: Unobjectionable entertainment.

- The key to keep audiences happy: Creating content in response to their likes.
- Personalised filters affect the way we think and learn.
- We tend to convert papers with lots and lots of data into "likely to be true".
- Information itself wants to be reduced to a simple statement.

- The more expert you are in a topic, the more reality-bias you have and the less successfully you will predict.

- Consuming information that conforms to our ideas is super easy. That is why we do it.
- The filter bubble shows us things, but it also hides other things to us and we are not as compelled to learn about new things if we do not know about them.

- It is important to be able to do what you would like to do but also to know what is possible to do.
- For the time being, Internet personalisation does not capture the difference between your work self and your play self.
- There is a difference between that we watch and what we should watch.
- Profiling gives companies the ability to circumvent your rational decision making.
- Personalisation still does not distinguish signal to noise.
- If our best moments are often the most unpredictable ones, what will happen to us if our bubble is fully predictable?
- The bottomline: In the book the author mentions that we do not know the effects of this filter bubble. However, six years after its publication, we can see its real consequences in terms of fake news and isolation.

- The existence of the cloud. Personal data in the cloud, outside your computer, is much easier to search than info on your computer.
- Statement extracted from the book (published in 2011): "Personalised outreach gives better bang for the political buck".

- In the post-materialism era we buy things to express our identity, not because we need the item we buy.

- The personalised bubble make getting people from a community to make better collective decisions more difficult.

- Peter Thiel, American entrepreneur, e.g. Paypal founder, states that "freedom and democracy are no longer compatible".

- Engineers resists the idea that their work has moral or political consequences.

- Small pieces of advice: Delete our browser history every now and then. And if you dare, your cookies ;-) Use the incognito tab in your browser.

- Be aware of the power of default e.g. by default when you open the browser you do not land on the incognito tab.

- The author states that there are also possibilities to improve using this technology if companies are transparent in explaining how their filters work and how they use our data.

- Corporate responsibility is required, and probably also a kind of oversight.

- Personal data should be considered a personal property.

Too much to think about in only one post!
Happy reading!

Hello to a new world

Book highlights: Willpower: Rediscovering the Greatest Human Strength by Roy Baumeister and John Tierney

This book about Willpower by Roy Baumeister and John Tierney is worth reading it to prepare the next term, especially when the to-do list is long and the leisure temptations are formidable.

In a super concise nutshell, and never replacing its reading, the points I highlight for those Information Security experts following this blog already for years are the following:

- A monthly plan is much more effective than a daily plan. Days go differently as planned but months give you the time you need to achieve your goals.
- Short-term targets need to be anchored to long-term targets, otherwise they are very dangerous.
- Our will power requires energy. More specifically, it requires glucose in our brain.
- Decision taking also requires energy. If you have no energy, do not take decisions that time.
- We can train our will power. Start with baby steps, right as I recommend in my IT Security Management book.
- Being part of a community with goals similar to ours always help to grow will power. The opposite is unfortunately also true.
- When you prepare your to-do lists, fine grain your complex goals into manageable activities.
- Proposal: Work with bi-weekly or monthly plans and revise them.
- For parents: Make your children participate in the creation of the family's plans and their plans (and even yours).
- If you are tired, do not decide or get exposed to tempting situations. Maybe the most important learning point of this book.

Happy will power!

The sky is the limit

Book Review: Site Reliability Engineering. How Google runs production systems

The following points come from a book by many Googleans and related colleagues such as Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy title "Site Reliability Engineering: How Google runs productions systems".

Disclaimer: As always, in every book review I have posted, these reviews are just invitations to read the book. Not a replacement!

"Traditionally system developers ended their task once we threw their creation into production". This brought only trouble both to the final customers and to the staff in charge of providing the service.

This book is basically Google's attempt to revamp the role of system administrator and operator in production. To place it at the same level system developers were and are.

No magic solution, just common smart sense i.e. giving system admins in prod the possibility to improve the system themselves, to automate and to scale. The authors confirm that their proposal is a specific DevOps way.

From manual steps to externally maintained automation, both system specific and generic, then to internal automation and finally autonomy.
How do they define reliability: "Probability that a system performs a function with no failure under stated conditions for a period of time". An outage for the SRE, when planned, is a change to improve the system, to innovate.

Service reliability hierarchy
Bottom-up: Monitoring, incident response, post-mortem/root cause analysis, testing and release procedures, capacity planning, development and product.

"Hope is not a valid strategy"
70% of outages come from changes in a live system.

Monitoring software should do the interpretation and humans be notified via alerts, tickets or logging (according to the criticality). No email alerts, use a dashboard with flashy colours. Nowadays monitoring is more a collection of time series (more powerful than only SNMP) i.e. a sequence of values and timestamps. The data source for automated evaluating rules.

Black box monitoring (how is the user experience?) and white box (monitoring system internals).

This way we reduce the MTTF (mean time to failure) and the MTTR (mean time to repair).

Latency vs throughput
System engineers need to understand what is best for their system, the smart mix between latency (how long) and throughput (how many). Think about cost vs projected increase in revenue. Key point: Aim for the right Service Level Objective. Do not overachieve. Over-achievement in terms of availability prevents you from innovating and improving the system.

Avoid toil
Manual, repetitive work needs to be automated. Monitoring data not being used is a candidate for renewal. Blending together too many results is complex. In a 10 to 12 SRE team, 1 or 2 people are devoted to monitoring.

Release engineering
Includes also config management at the beginning of the product lifecycle. Frequent releases result in fewer changes in between versions. Distinguish between inherent complexity and accidental complexity and avoid the latter.
In software, less is more (and more expensive). Versioning APIs is a good idea.

Incident management teams
Multi-sites teams incur in a communication overhead. How do you know the team is in the sweet spot? When handling an incident takes 6 hours, including root cause analysis and post-mortem. Prefer the rational, focused and cognitive (procedure-based) process rather than the intuitive, fast and automated. Provide clear escalation paths and follow a blameless postmortem culture. Use an incident management web based tool.

Avoid operational overhead. If there are too many alers, give the pager back to the initial developer. Prepare for outages, drill it, test the what if...?  Team members should be on-call at least once or twice per quarter.

Separation of duties in incident management: ops (rotating roles among teams and time zones), communication and planning.

Testing is continuous. Testing reduces uncertainty and reliability decreases in each change. Include configuration tests.

Team size
It should not scale directly with service growth.

Best practices
Fail safely. Make progressive rollouts. Define your error/bug budget. Follow the monitoring principles (hierarchy), make post-mortems and include capacity planning.

Look not only at mean latency but also at distribution of latencies. Prevent server overload by means of built-in graceful degradation.

Leader election requires a reformulation of the distributed asynchronous consensus problem. It cannot be solved using heartbeats (but rather replicated state machines). A byzantine failure is e.g. an incorrect message due to a bug or a malicious activity.

Production readiness review
An early involvement is desired. SRE can only work with frameworks to scale. Data integrity is the means, data availability is the goal. 

Rocky landscape
Happy reliable reading!
Interested in the mindmap of it? Here you are part 1.

And part 2.

Book Review: Practical Data Science with R by Nina Zumel and Jim Porzak

This is a very very brief collection of points extracted from this book titled "Practical Data Science with R". For those starting in this field of Data Science a recommendable foundational reference.

The main parts: An introduction to Data Science, modelling methods and delivering results.

As always, an important disclaimer when talking about a book review: The reading of this very personal and non-comprehensive list of points, mostly taken verbatim from the book, by no means replaces the reading of the book it refers to; on the contrary, this post is an invite to read the entire work.

Part 1 - Intro to Data Science

I would highlight the method the authors propose to deal with data investigations:

- Define the goal - What problem are you solving?
- Collect and manage data - What info do you need?
- Build the model - Find patterns in data that leads to a solution
- Evaluate and critique the model - Does the model solve my problem?
- Present results and document - Establish that you can solve the data problem and explain how
- Deploy the model - Deploy the model to solve the problem in the real world.

Part 2 - Models

Common classification methods such as e.g. Naive Bayes classifier, Decision trees, Logistic regression, Support vector machine.
To forecast is to assign a probability (the key is how to map data into a model).

Model types: Classification, scoring, probability estimation, ranking and clustering.
For most model evaluations, it is usual to compute one or two summary scores using a few ideal models: a null model, a Bayes rate model and the best single variable model.

Evaluating scoring models:
- Always try single variable models before trying more complicated techniques.
- Single variable modelling techniques give a useful start on variable selection.
- Consider decision trees, nearest neighbour and naive Bayes models as basic data memorization techniques.

- Functional models allow to better explore how changes in inputs affect predictions.
- Linear regression is a good first technique to model quantities.
- Logistics regression is a good first technique to model probabilities.
- Models with simple forms come with very powerful summaries and diagnostics.
- Unsupervised methods find structure (e.g. discovered clusters, discovered rules) in the data, often as a prelude to predictive modelling.

Part 3 - Delivering results 

Nowadays information systems are built off large databases. Most systems are online, mistakes in terms of data interpretation are common and mostly none of these systems are concerned with cause.

Enjoy the data forest

Wannacry related interim timeline

Let me share a timeline I constructed regarding Wannacry during the last days. The interesting point I shared with some colleagues was that the patient zero (o patients) infection vector is not referenced or described as of now yet.

15th February 2017 Microsoft cancels its monthly patching for that month

9th March 2017 Wikileaks press release regarding Vault7, "the largest-ever publication of confidential documents on the agency" according to Wikileaks.

14th March 2017 Microsoft publish security update MS17-010 for SMB Server

14th April 2017 (according to Equation Group (see releases some exploits, EternalBlue among them. EternalBlue took advantage of the vulnerability that Microsoft patch MS17-010 fiexed.

14th April 2017 Microsoft publish their triage analysis on the exploits

15th April 2017 Security companies analyse exploits. One example of the anaylisis of EternalBlue is the following:

15th April 2017 Some news sites start to wonder how come that the patch existed before the release e.g.

12th May 2017 WannaCry appears in the wild

Some sources mention that the infection vector was a phishing email

However, no analysis yet of that mentioned phishing email, its attachment and its modus operandi in general.

Update 1: Response and proposals from Microsoft

Rocky days